An Analysis of Beer, Beer Styles, and Breweries

by Carl Smith

Table of Contents:

Preliminary Wrangling

This analysis will be exploring a dataset of about 1.5 million beer reviews from BeerAdvocate. In addition, the database/API called Open Brewery DB is also used for more detailed brewery data.

A few notes about the dataset:

  • Each row contains one unique review from one unique reviewer. Each beer can have more than one review from the same reviewer or a different reviewer.
  • Some key column definitions:
    • weighted_review: engineered variable that calculates a weighted average of all 5 review categories. This is the primary review score that is listed on BeerAdvocate's website. Each weight is defined by BeerAdvocate as 20%, 24%, 6%, 10% and 40% respectively to the following review categories.
    • review_overall: overall review score given for overall impression of the beer.
    • review_aroma: review score given for any malt, hops, yeast, and other aromatics present in the beer.
    • review_appearance: review score given for the beer's color, clarity, head retention, and lacing.
    • review_palate: review score given for the beer's body, carbonation, warmth, creaminess, astringency and other palate sensations.
    • review_taste: review score given for any malt, hops, fermentation byproducts, balance, finish or aftertaste and other flavor characteristics.
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
In [2]:
beer_df = pd.read_csv('beer_reviews.csv')
beer_df
Out[2]:
brewery_id brewery_name review_time review_overall review_aroma review_appearance review_profilename beer_style review_palate review_taste beer_name beer_abv beer_beerid
0 10325 Vecchio Birraio 1234817823 1.5 2.0 2.5 stcules Hefeweizen 1.5 1.5 Sausa Weizen 5.0 47986
1 10325 Vecchio Birraio 1235915097 3.0 2.5 3.0 stcules English Strong Ale 3.0 3.0 Red Moon 6.2 48213
2 10325 Vecchio Birraio 1235916604 3.0 2.5 3.0 stcules Foreign / Export Stout 3.0 3.0 Black Horse Black Beer 6.5 48215
3 10325 Vecchio Birraio 1234725145 3.0 3.0 3.5 stcules German Pilsener 2.5 3.0 Sausa Pils 5.0 47969
4 1075 Caldera Brewing Company 1293735206 4.0 4.5 4.0 johnmichaelsen American Double / Imperial IPA 4.0 4.5 Cauldron DIPA 7.7 64883
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1586609 14359 The Defiant Brewing Company 1162684892 5.0 4.0 3.5 maddogruss Pumpkin Ale 4.0 4.0 The Horseman's Ale 5.2 33061
1586610 14359 The Defiant Brewing Company 1161048566 4.0 5.0 2.5 yelterdow Pumpkin Ale 2.0 4.0 The Horseman's Ale 5.2 33061
1586611 14359 The Defiant Brewing Company 1160702513 4.5 3.5 3.0 TongoRad Pumpkin Ale 3.5 4.0 The Horseman's Ale 5.2 33061
1586612 14359 The Defiant Brewing Company 1160023044 4.0 4.5 4.5 dherling Pumpkin Ale 4.5 4.5 The Horseman's Ale 5.2 33061
1586613 14359 The Defiant Brewing Company 1160005319 5.0 4.5 4.5 cbl2 Pumpkin Ale 4.5 4.5 The Horseman's Ale 5.2 33061

1586614 rows × 13 columns

In [3]:
beer_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1586614 entries, 0 to 1586613
Data columns (total 13 columns):
 #   Column              Non-Null Count    Dtype  
---  ------              --------------    -----  
 0   brewery_id          1586614 non-null  int64  
 1   brewery_name        1586599 non-null  object 
 2   review_time         1586614 non-null  int64  
 3   review_overall      1586614 non-null  float64
 4   review_aroma        1586614 non-null  float64
 5   review_appearance   1586614 non-null  float64
 6   review_profilename  1586266 non-null  object 
 7   beer_style          1586614 non-null  object 
 8   review_palate       1586614 non-null  float64
 9   review_taste        1586614 non-null  float64
 10  beer_name           1586614 non-null  object 
 11  beer_abv            1518829 non-null  float64
 12  beer_beerid         1586614 non-null  int64  
dtypes: float64(6), int64(3), object(4)
memory usage: 157.4+ MB

Let's clean up our dataframe a bit, by removing some unwanted columns and dealing with missing values.

In [4]:
# remove unwanted columns, and rearrange some columns. 
beer_df_clean = beer_df.drop(columns=['review_time', 'review_profilename'])
beer_df_clean = beer_df_clean[['brewery_id', 'brewery_name', 'beer_beerid', 'beer_name', 'beer_style', 'beer_abv', 
                              'review_overall', 'review_aroma', 'review_appearance', 'review_palate', 'review_taste']]
beer_df_clean
Out[4]:
brewery_id brewery_name beer_beerid beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste
0 10325 Vecchio Birraio 47986 Sausa Weizen Hefeweizen 5.0 1.5 2.0 2.5 1.5 1.5
1 10325 Vecchio Birraio 48213 Red Moon English Strong Ale 6.2 3.0 2.5 3.0 3.0 3.0
2 10325 Vecchio Birraio 48215 Black Horse Black Beer Foreign / Export Stout 6.5 3.0 2.5 3.0 3.0 3.0
3 10325 Vecchio Birraio 47969 Sausa Pils German Pilsener 5.0 3.0 3.0 3.5 2.5 3.0
4 1075 Caldera Brewing Company 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 4.5
... ... ... ... ... ... ... ... ... ... ... ...
1586609 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 5.0 4.0 3.5 4.0 4.0
1586610 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.0 5.0 2.5 2.0 4.0
1586611 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.5 3.5 3.0 3.5 4.0
1586612 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.0 4.5 4.5 4.5 4.5
1586613 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 5.0 4.5 4.5 4.5 4.5

1586614 rows × 11 columns

In [5]:
# deal with missing values in the beer_abv column. 
beer_df_clean.beer_abv.isna().sum()
Out[5]:
67785
In [6]:
beer_df_clean[beer_df_clean['beer_abv'].isna()]
Out[6]:
brewery_id brewery_name beer_beerid beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste
273 1075 Caldera Brewing Company 21241 Cauldron Espresso Stout American Stout NaN 3.0 3.0 3.0 4.0 3.0
430 850 Moon River Brewing Company 20689 The Highland Stagger Scotch Ale / Wee Heavy NaN 3.5 4.0 4.5 3.5 3.5
603 850 Moon River Brewing Company 20689 The Highland Stagger Scotch Ale / Wee Heavy NaN 4.0 3.5 4.0 3.5 3.5
733 1075 Caldera Brewing Company 54723 Alpha Beta American IPA NaN 4.0 4.0 4.0 4.0 4.0
798 1075 Caldera Brewing Company 42964 Imperial Stout American Double / Imperial Stout NaN 4.5 4.5 4.0 4.0 4.5
... ... ... ... ... ... ... ... ... ... ... ...
1586568 14359 The Defiant Brewing Company 36424 Bock Bock NaN 4.0 3.5 4.0 4.0 4.0
1586587 14359 The Defiant Brewing Company 36555 Maibock Maibock / Helles Bock NaN 3.5 4.5 4.0 4.5 4.0
1586596 14359 The Defiant Brewing Company 48360 Resolution #2 Belgian Strong Pale Ale NaN 4.0 3.0 5.0 4.0 3.5
1586597 14359 The Defiant Brewing Company 48360 Resolution #2 Belgian Strong Pale Ale NaN 4.5 4.5 4.0 4.0 4.0
1586598 14359 The Defiant Brewing Company 48360 Resolution #2 Belgian Strong Pale Ale NaN 4.0 4.0 3.5 4.0 4.5

67785 rows × 11 columns

In [7]:
# drop these rows. 
beer_df_clean.dropna(subset=['beer_abv'], inplace=True)
In [8]:
beer_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1518829 entries, 0 to 1586613
Data columns (total 11 columns):
 #   Column             Non-Null Count    Dtype  
---  ------             --------------    -----  
 0   brewery_id         1518829 non-null  int64  
 1   brewery_name       1518814 non-null  object 
 2   beer_beerid        1518829 non-null  int64  
 3   beer_name          1518829 non-null  object 
 4   beer_style         1518829 non-null  object 
 5   beer_abv           1518829 non-null  float64
 6   review_overall     1518829 non-null  float64
 7   review_aroma       1518829 non-null  float64
 8   review_appearance  1518829 non-null  float64
 9   review_palate      1518829 non-null  float64
 10  review_taste       1518829 non-null  float64
dtypes: float64(6), int64(2), object(3)
memory usage: 139.1+ MB

We also have a few missing values in the brewery_name column. Let's go ahead and drop those as well.

In [9]:
beer_df_clean.dropna(subset=['brewery_name'], inplace=True)
beer_df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1518814 entries, 0 to 1586613
Data columns (total 11 columns):
 #   Column             Non-Null Count    Dtype  
---  ------             --------------    -----  
 0   brewery_id         1518814 non-null  int64  
 1   brewery_name       1518814 non-null  object 
 2   beer_beerid        1518814 non-null  int64  
 3   beer_name          1518814 non-null  object 
 4   beer_style         1518814 non-null  object 
 5   beer_abv           1518814 non-null  float64
 6   review_overall     1518814 non-null  float64
 7   review_aroma       1518814 non-null  float64
 8   review_appearance  1518814 non-null  float64
 9   review_palate      1518814 non-null  float64
 10  review_taste       1518814 non-null  float64
dtypes: float64(6), int64(2), object(3)
memory usage: 139.1+ MB
In [10]:
beer_df_clean
Out[10]:
brewery_id brewery_name beer_beerid beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste
0 10325 Vecchio Birraio 47986 Sausa Weizen Hefeweizen 5.0 1.5 2.0 2.5 1.5 1.5
1 10325 Vecchio Birraio 48213 Red Moon English Strong Ale 6.2 3.0 2.5 3.0 3.0 3.0
2 10325 Vecchio Birraio 48215 Black Horse Black Beer Foreign / Export Stout 6.5 3.0 2.5 3.0 3.0 3.0
3 10325 Vecchio Birraio 47969 Sausa Pils German Pilsener 5.0 3.0 3.0 3.5 2.5 3.0
4 1075 Caldera Brewing Company 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 4.5
... ... ... ... ... ... ... ... ... ... ... ...
1586609 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 5.0 4.0 3.5 4.0 4.0
1586610 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.0 5.0 2.5 2.0 4.0
1586611 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.5 3.5 3.0 3.5 4.0
1586612 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.0 4.5 4.5 4.5 4.5
1586613 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 5.0 4.5 4.5 4.5 4.5

1518814 rows × 11 columns

That looks better.


Now that we have our df cleaned up a bit, I am interested in trying to get some more data for breweries, such as location and type of brewery.

My idea is to see if there is a city or state that has a high correaltion with overall rating. Then I want to try and make a geospatial visualization of which states or cities receive the highest ratings.

I will be using a python wrapper for the Open Brewery DB API, called openbrewery, to get this data.

In [11]:
# import openbrewerydb as ob

# brewery_df = ob.load()
# brewery_df.to_csv('openbrewerydb.csv', index=False)
# brewery_df

Since the above import statement won't work unless you install the wrapper, I saved the data to a csv in this project file.

In [12]:
brewery_df = pd.read_csv('openbrewerydb.csv')
brewery_df
Out[12]:
id name brewery_type street address_2 address_3 city state county_province postal_code country longitude latitude phone website_url updated_at created_at
0 2 Avondale Brewing Co micro 201 41st St S NaN NaN Birmingham Alabama NaN 35222-1932 United States -86.774322 33.524521 2.057775e+09 http://www.avondalebrewing.com 2018-08-23T23:19:57.825Z 2018-07-24T01:32:47.255Z
1 44 Trim Tab Brewing micro 2721 5th Ave S NaN NaN Birmingham Alabama NaN 35233-3401 United States -86.791400 33.512849 2.057031e+09 http://www.trimtabbrewing.com 2018-08-23T23:20:31.423Z 2018-07-24T01:32:47.815Z
2 46 Yellowhammer Brewery micro 2600 Clinton Ave W NaN NaN Huntsville Alabama NaN 35805-3046 United States -86.593201 34.727752 2.569756e+09 http://www.yellowhammerbrewery.com 2018-08-23T23:20:33.102Z 2018-07-24T01:32:47.838Z
3 55 Bearpaw River Brewing Co micro 4605 E Palmer Wasilla Hwy NaN NaN Wasilla Alaska NaN 99654-7679 United States -149.412710 61.575269 NaN http://bearpawriverbrewing.com 2018-08-23T23:20:40.743Z 2018-07-24T01:32:47.967Z
4 76 King Street Brewing Co micro 9050 King Street NaN NaN Anchorage Alaska NaN 99515 United States -149.879076 61.138489 9.073365e+09 http://www.kingstreetbrewing.com 2018-08-23T23:20:57.179Z 2018-07-24T01:32:48.301Z
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8027 7923 Public Craft Brewing Co micro 716 58th St NaN NaN Kenosha Wisconsin NaN 53140-4137 United States -87.819751 42.583154 2.626523e+09 http://www.publiccraftbrewing.com 2018-08-24T16:44:56.034Z 2018-07-24T01:34:36.293Z
8028 7958 Stevens Point Brewery regional 2617 Water St NaN NaN Stevens Point Wisconsin NaN 54481-5248 United States -89.573981 44.510263 7.153449e+09 http://www.pointbeer.com 2018-08-24T16:45:45.320Z 2018-07-24T01:34:36.742Z
8029 7973 Titletown Brewing Co brewpub 200 Dousman St NaN NaN Green Bay Wisconsin NaN 54303-2712 United States -88.018517 44.518920 9.204372e+09 http://www.titletownbrewing.com 2018-08-24T16:46:06.736Z 2018-07-24T01:34:36.936Z
8030 8000 Black Tooth Brewing Co micro 312 Broadway St NaN NaN Sheridan Wyoming NaN 82801-3917 United States -106.953506 44.800895 3.076752e+09 http://www.blacktoothbrewingcompany.com 2018-08-24T16:46:40.667Z 2018-07-24T01:34:37.335Z
8031 8007 Gillette Brewing Company brewpub 301 S Gillette Ave NaN NaN Gillette Wyoming NaN 82716-3705 United States -105.503614 44.292038 3.076709e+09 http://www.gillettebrewingcompany.com 2018-08-24T16:46:48.215Z 2018-07-24T01:34:37.423Z

8032 rows × 17 columns

In [13]:
# rename 'name' column. 
brewery_df.rename(columns = {'name':'brewery_name'}, inplace=True)
In [14]:
# number of brewery name matches between our two dataframes.
a = beer_df_clean['brewery_name'].unique().tolist()
b = brewery_df['brewery_name'].unique().tolist()
matches = list(set(a).intersection(b))
len(matches)
Out[14]:
268

Ok so there are quite a few breweries that didn't match between the two dataframes. However, there is a good chance that there are more breweries that do match but there is some small difference in the name, for example 'Co.' instead of 'Company'.

My first idea was to loop through beer_df_clean.brewery_name column, split the first name from each brewery, and then search for breweries with this first name in brewery_df.brewery_name.

Unfortunately this plan didn't workout. Therefore I will just do some physical searching of similar brewery names, with as much programmatic help wherever possible.

In [15]:
# first we need a list of breweries that did not initially match from the beer dataframe. 
brewery_temp = beer_df_clean[~beer_df_clean.brewery_name.isin(matches)]
brewery_temp
Out[15]:
brewery_id brewery_name beer_beerid beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste
0 10325 Vecchio Birraio 47986 Sausa Weizen Hefeweizen 5.0 1.5 2.0 2.5 1.5 1.5
1 10325 Vecchio Birraio 48213 Red Moon English Strong Ale 6.2 3.0 2.5 3.0 3.0 3.0
2 10325 Vecchio Birraio 48215 Black Horse Black Beer Foreign / Export Stout 6.5 3.0 2.5 3.0 3.0 3.0
3 10325 Vecchio Birraio 47969 Sausa Pils German Pilsener 5.0 3.0 3.0 3.5 2.5 3.0
4 1075 Caldera Brewing Company 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 4.5
... ... ... ... ... ... ... ... ... ... ... ...
1586609 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 5.0 4.0 3.5 4.0 4.0
1586610 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.0 5.0 2.5 2.0 4.0
1586611 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.5 3.5 3.0 3.5 4.0
1586612 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 4.0 4.5 4.5 4.5 4.5
1586613 14359 The Defiant Brewing Company 33061 The Horseman's Ale Pumpkin Ale 5.2 5.0 4.5 4.5 4.5 4.5

1370008 rows × 11 columns

In [16]:
# same thing from the brewery dataframe. 
brewery_temp2 = brewery_df[~brewery_df.brewery_name.isin(matches)]
brewery_temp2
Out[16]:
id brewery_name brewery_type street address_2 address_3 city state county_province postal_code country longitude latitude phone website_url updated_at created_at
0 2 Avondale Brewing Co micro 201 41st St S NaN NaN Birmingham Alabama NaN 35222-1932 United States -86.774322 33.524521 2.057775e+09 http://www.avondalebrewing.com 2018-08-23T23:19:57.825Z 2018-07-24T01:32:47.255Z
1 44 Trim Tab Brewing micro 2721 5th Ave S NaN NaN Birmingham Alabama NaN 35233-3401 United States -86.791400 33.512849 2.057031e+09 http://www.trimtabbrewing.com 2018-08-23T23:20:31.423Z 2018-07-24T01:32:47.815Z
2 46 Yellowhammer Brewery micro 2600 Clinton Ave W NaN NaN Huntsville Alabama NaN 35805-3046 United States -86.593201 34.727752 2.569756e+09 http://www.yellowhammerbrewery.com 2018-08-23T23:20:33.102Z 2018-07-24T01:32:47.838Z
3 55 Bearpaw River Brewing Co micro 4605 E Palmer Wasilla Hwy NaN NaN Wasilla Alaska NaN 99654-7679 United States -149.412710 61.575269 NaN http://bearpawriverbrewing.com 2018-08-23T23:20:40.743Z 2018-07-24T01:32:47.967Z
4 76 King Street Brewing Co micro 9050 King Street NaN NaN Anchorage Alaska NaN 99515 United States -149.879076 61.138489 9.073365e+09 http://www.kingstreetbrewing.com 2018-08-23T23:20:57.179Z 2018-07-24T01:32:48.301Z
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8025 7899 MoonRidge Brewpub brewpub 501 Bridge St NaN NaN Cornell Wisconsin NaN 54732-8392 United States -91.158965 45.165375 7.152391e+09 http://www.facebook.com/MoonRidge-Brew-Pub-150... 2018-08-24T16:44:23.532Z 2018-07-24T01:34:35.976Z
8027 7923 Public Craft Brewing Co micro 716 58th St NaN NaN Kenosha Wisconsin NaN 53140-4137 United States -87.819751 42.583154 2.626523e+09 http://www.publiccraftbrewing.com 2018-08-24T16:44:56.034Z 2018-07-24T01:34:36.293Z
8029 7973 Titletown Brewing Co brewpub 200 Dousman St NaN NaN Green Bay Wisconsin NaN 54303-2712 United States -88.018517 44.518920 9.204372e+09 http://www.titletownbrewing.com 2018-08-24T16:46:06.736Z 2018-07-24T01:34:36.936Z
8030 8000 Black Tooth Brewing Co micro 312 Broadway St NaN NaN Sheridan Wyoming NaN 82801-3917 United States -106.953506 44.800895 3.076752e+09 http://www.blacktoothbrewingcompany.com 2018-08-24T16:46:40.667Z 2018-07-24T01:34:37.335Z
8031 8007 Gillette Brewing Company brewpub 301 S Gillette Ave NaN NaN Gillette Wyoming NaN 82716-3705 United States -105.503614 44.292038 3.076709e+09 http://www.gillettebrewingcompany.com 2018-08-24T16:46:48.215Z 2018-07-24T01:34:37.423Z

7742 rows × 17 columns

These two dataframes provide us with all of the breweries that did not initially match. The first dataframe brewery_temp contains breweries from our main dataframe beer_df_clean. The second dataframe brewery_temp2 contains breweries from brewery_df, which contains the additional brewery data we want in the main dataframe.

Now I can create lists of brewery names from each dataframe, export them as a csv, sort them, and open the full lists in Notepad++ to physically analyze them next to each other. I will find the breweries that are present in both lists, but have a small difference in the name which didn't allow them to be easily programmatically matched. I will then save the matches I find as two new lists. I can use these two new lists to make a dictionary to rename the breweries in the brewery_df so that they can match the brewery names in beer_df_clean. Finally, I can rerun list(set(a).intersection(b)) and hopefully have more than 268 matches.

In [17]:
# create lists of these breweries
c = brewery_temp.brewery_name.unique().tolist()
d = brewery_temp2.brewery_name.unique().tolist()
In [18]:
# preview of our list. This is a list of every brewery that did not initially match.
sorted(c)
Out[18]:
["'t Hofbrouwerijke",
 '(512) Brewing Company',
 '10 Barrel Brewing Co.',
 '1516 Brewing Company',
 '16 Mile Brewing Company',
 '1648 Brewing Company Ltd',
 '1702 / The Address Brewing Co.',
 '192 Brewing Company',
 '2 Brothers Brewery',
 '2nd Shift Brewery',
 '3 Ravens Brewing',
 '3 Stars Brewing Company',
 '32 Via Dei Birrai',
 '4 Hands Brewing Co.',
 '4 Pines Brewing Company',
 '4Seasons Sports Bar & Brew Pub',
 '4th Street Brewing Co.',
 '5 Rabbit Cerveceria',
 '50 Back Brewing Company',
 '508 Gastrobrewery',
 '5280 Roadhouse and Brewery',
 '7 Seas Brewery and Taproom',
 '75th Street Brewery',
 '7venth (Seventh) Sun Brewery',
 '8 Wired Brewing Co.',
 '961 Beer',
 'A Tribbiera',
 'A.J.I. Beer Inc',
 'A1A Aleworks',
 'AB Group, Ltd.',
 'AC Golden Brewing Company',
 'ALDI Stores Australia',
 'AMB - Maître Brasseur',
 'AO Susyndar',
 'AS L&#257;&#269;pl&#275;\x9aa Alus',
 'AS Puls Brewery',
 'AS Tartu Õlletehas',
 'Aarhus Bryghus',
 'Aass Brewery',
 'Abashiri, K.K.',
 'Abbaye Mont Des Cats',
 'Abbaye de Leffe S.A.',
 'Abbey Ales Limited',
 'Abbey Bells Brewery',
 'Abbey Beverage Company',
 'Abbey Beverage Company (Monastery of Christ in the Desert)',
 'Abbey Brewing Co.',
 'Abbey Wright Brewing Co.',
 'Abbeydale Brewery',
 'Aberdeen Brewing Company',
 'Abita Brewing Co.',
 'Abteibrauerei Siegburger Brauhaus - FritzAle',
 'Academy of Fine Beers',
 'Acadie-Broue',
 'Accra Brewery Limited',
 'Acorn Brewery Of Barnsley',
 "Adelbert's Brewery",
 'Adirondack Pub & Brewery',
 'Adnams plc, Sole Bay Brewery',
 'Afanasy Pivo (Brau Service)',
 'Against The Grain Brewery & Smokehouse',
 'Agassiz Brewing Co.',
 'Ahlafors Bryggerier AB',
 'Ahornberger Landbrauerei',
 'Aiken Brewing Company',
 'Airbräu München',
 'Airdale Brewing Company',
 'Airways Brewing Company',
 'Ajeper S.A.',
 'Aksarben Brewing Co.',
 'Aktien-Brauerei Kaufbeuren AG',
 'Aktienbrauerei Flims Surselva AG',
 'Aktienbrauerei Simmerberg AG',
 'Akuna Brewery',
 'Al Ahram Beverages Company',
 "Al's of Hampden / Pizza Boy Brewing",
 'Alameda Brewhouse',
 'Alamo Beer Co.',
 'Alandale Brewing Company',
 'Alaus Darykla Kalnapilis',
 'Albani Bryggerierne (Royal Unibrew)',
 'Alcatraz Brewing Co.',
 'Alcazar Brewing Co.',
 'Aldaris',
 'AleSmith Brewing Company',
 'Ales Of Scilly Brewery',
 "Alexander Keith's",
 'Alfa Bierbrouwerij B.V.',
 'Alina S.R.L.',
 'All Stars Bakery',
 'Allagash Brewing Company',
 'Allendale Brew Company Limited',
 'Alley Kat Brewing Company',
 'Allgates Brewery',
 'Allgäuer Brauhaus AG',
 'Alltech’s Lexington Brewing and Distilling Co.',
 'Almanac Beer Co.',
 "Almond '22",
 'Alpirsbacher Klosterbräu',
 'Alt Platz Brewing Company LLC',
 'Alt-Oberurseler Brauhaus',
 'Alte Klosterbrauerei Vierzehnheiligen Brauerei Trunk',
 'Altenburger Brauerei GmbH',
 'Altes Tramdepot',
 'Altitude Chophouse & Brewery',
 'Altöttinger Hell-Bräu GmbH',
 'AmBev CentroAmerica',
 'Amager Bryghus',
 'Amarcord Birra Artigianale Italiana',
 'Ambassador Brewing Company',
 'Amber Ales Limited',
 "Amber's Brewing Co.",
 "America's Brewpub",
 'American Honor Beer Company',
 'Amherst Brewing Co.',
 'Amicas',
 'Amnesia Brewing',
 "Amos Howard's Brew Pub",
 'Amstar',
 'Amstel Brouwerij B. V.',
 'Amsterdam Brewing Company',
 'Amundsen Bryggeri & Spiseri',
 'An Teallach Ale Co.',
 'Anacapa Brewing Company',
 'Anacortes Brewery & Rockfish Grill',
 'Anadolu Efes Brewery',
 'Anchor Brewing Company',
 'Anchorage Brewing Company',
 'Ancient Lakes Brewing Company',
 'Anderson Valley Brewing Company',
 'Andwell Brewing Company',
 'Angel City Brewing',
 'Anglo Dutch Brewery',
 'Angry Cedar Brewing Company',
 'Angry Minnow Brewpub',
 'Anheuser-Busch',
 'Ankerbrauerei Nagold GmbH',
 'Ankerbräu Nördlingen',
 'Antarctica (Companhia Brasileira De Bebidas)',
 'Antares',
 'Antigua Brewery Ltd.',
 'Anxanum Brewing Company',
 'Aotearoa Breweries NZ Ltd.',
 'Apatinska Pivara A.D.',
 'Appalachian Brewing Company',
 'Apu Brewery',
 'Arbor Ales',
 'Arbor Brewing Company',
 'Arcadia Brewing Company',
 'Arcense Stoombierbouwerij (Hertog Jan Brouwerij)',
 'Archers Brewing & Wholesale Limited',
 'Archipelago Brewing Company / Craft Beer Hub',
 'Arcobräu Gräfliches Brauhaus',
 'Arctic Craft Brewery',
 'Arctic Fox Brewing',
 'Ariana Brewery',
 "Arkell's Brewery Limited",
 'Arrow Brewing Company & Oak bar',
 'Art Cervesers - Cervesa Artesanal',
 'Artebirraia',
 'Artisan Brewing Co.',
 'Artos Breweries Ltd',
 'Arundel Brewery Limited',
 'Asahi Breweries Ltd',
 'Ascot Ales',
 'Asher Brewing Company',
 'Asheville Brewing Company',
 'Asheville Pizza and Brewing Co.',
 'Asia Pacific Breweries Ltd.',
 'Asmara Breweries',
 'Aspen Brewing Co.',
 'Ass Clown Brewing',
 'Ass Kisser Ales',
 'Associated Breweries & Distilleries Ltd.',
 'AstikA (Interbrew)',
 'Astoria Brewing Company / Wet Dog Cafe',
 'Athenian Brewery',
 'Atlantic Brewery',
 'Atlantic Brewing Company',
 'Atlas Brewery Ltd',
 'Atna Bryggeri',
 'Atomic Brewery',
 'Atwater Block Brewery',
 'Au Grimoire Des Légendes',
 'Auburn Alehouse Brewery & Restaurant',
 'Auchan Production',
 'AuerBräu AG',
 'Auerhahn Bräu Schlitz GmbH',
 'August Schell Brewing Company',
 'Augusta Brewing Company',
 'Augusta Bräu',
 'Augustiner Bräu Kloster Mülln OHG',
 'Augustiner-Bräu Wagner KG',
 'Australian Brewing Corporation',
 'Avalon Brewing',
 'Avery Brewing Company',
 'Avesta Bryggeri',
 'Avondale Brewing Company',
 'Awa Bakushu',
 'Azalea Coast Brewery',
 'Azumino Brewery',
 'B&O Station Brewery & Restaurant, The',
 'B&T Brewery Limited (Banks and Taylor)',
 'B.A.R.T.S.',
 "B.O.B.'s House Of Brews",
 'B.W. Beer Works USA',
 'B94',
 'BABB Srl',
 'BFM Brasserie Des Franches-Montagnes',
 'BIP (Beogradska Industrija Piva)',
 "BJ's Chicago Pizza & Brewery Inc.",
 "BJ's Pizza, Grill & Brewery",
 "BJ's Restaurant",
 "BJ's Restaurant & Brewery",
 "BJ's Restaurant & Brewhouse",
 "BJ's Restaurant And Brewhouse",
 'BJs Brewhouse',
 'BOSS Browar Witnica S.A.',
 'BVBA Brouwerij Broeder Jacob',
 "Babe's Bar-B-Que And Brewery",
 'Bacardi U.S.A., Inc.',
 'Back Forty Beer Co.',
 'Back Street Brewery',
 'Backcountry Brewery (Brewpub & Pizzeria)',
 'Backlash Beer Company',
 'Backwoods Brewing',
 'Backwoods Imports Inc.',
 'Bad Bear Brewing Company',
 'Bad Windsheimer Bürgerbräu',
 'Bad Wörishofer Lowenbrau / Privatbrauerei Hotel Braugasthof',
 'Badaevsky Pivovariniy Zavod',
 'Badger Brewery, Hall and Woodhouse',
 'Badische Staatsbrauerei Rothaus AG',
 'Badlands Brewery',
 'Baeren Brauerei',
 'Bahamian Brewery Beverage & Co. Ltd.',
 'Bahus Brewery',
 'Baias Garagardotegia',
 'Baikalskaya Pivovarenniya Kompaniya',
 'Baird Brewing Company / Fishmarket Taproom',
 'Baisinger Löwenbrauerei Teufel GmbH',
 'Baja Brewing Company',
 'Bakushukan Clair (Kure Beer)',
 'Bali Storm Brewing Company',
 "Ballard's Brewery Limited",
 'Balmain Brewing Company',
 'Baltika Breweries',
 'Baltika-Baku',
 'Baltimore Brewing Company',
 'Bandana Brewery',
 'Bandersnatch Brewpub',
 'Bandwagon Brew Pub',
 'Banff Ave. Brewing Co.',
 'Banfi',
 'Banjalu&#269;ka Pivara',
 'Bank Street Brewing',
 'Bank Top Brewery',
 'Banks (Barbados) Breweries Limited',
 'Banks Canada / Hart Brewing Co.',
 'Bar Harbor Brewing Company',
 'Bar Teerenpeli',
 'Barada Beer Company',
 "Bard's Tale Beer Company, LLC",
 'Bare Bones Grill & Brewery',
 'Barefoot Brewers',
 'Barefoot Brewery',
 'Barlan Beverages Group',
 'Barley Brothers Brewery & Grill',
 "Barley Brown's Brewpub",
 'Barley Creek Brewing Company',
 'Barley Days Brewery',
 'Barley Island Brewing Company',
 "Barley John's Brew Pub",
 'Barley Mill Brew Pub / Trifons Pizza',
 'Barley and Hops Grill & Microbrewery',
 "Barley's Brewing Company",
 "Barley's Casino & Brewing Co.",
 "Barley's Smokehouse and Brewpub",
 'Barnaul Brewery',
 "Barney's Beer",
 'Barngates Brewery Limited',
 'Barnsley Brewery Company Limited',
 'Baron Brewing',
 'Barons Brewing Company',
 'Barossa Brewing Company',
 'Barossa Valley Brewing',
 'Barrel House Brewing Company',
 'Barrier Brewing Company',
 'Barrington Brewery & Restaurant',
 'Barrio Brewing Company',
 'Bartrams Brewery',
 'Barum Brewery Limited',
 'Bass Brewers Limited',
 'Bastone',
 'Bat Creek Brewery',
 'Bath Ales Ltd',
 'Battered Boar Brewing Company',
 'Battersea Brewery',
 'Battlefield Brewing Co At The Pub',
 'Bauer´s Marchfelder Storchenbräu',
 'Bauskas Alus',
 'Bavaria Brewery',
 'Bavaria Brouwerij N.V.',
 'Bavaria-St. Pauli-Brauerei GmbH',
 'Bavarian Barbarian Brewing Company',
 'Baxter Brewing Co.',
 'Bayerische Brauerei Kaiserslautern',
 'Bayerische Löwenbrauerei',
 'Bayerische Staatsbrauerei Weihenstephan',
 'Bayern Brewing, Inc.',
 'Bayhawk Ales',
 'Baying Hound Aleworks',
 'Bayou Teche Brewery',
 'Bayreuther Bierbrauerei',
 'Bays Brewery Ltd.',
 "Bazens' Brewery Co Ltd",
 'Beach Brewing Company',
 'Beach Chalet Brewery and Restaurant',
 'Beacon Hotel (Sarah Hughes Brewery)',
 'Beamish & Crawford Plc.',
 'Bear Brewing',
 'Bear Creek Brewing Company',
 'Bear Republic Brewing Co.',
 'Beard & Brau',
 'Beartown Brewery',
 "Beau's All Natural Brewing Company",
 'Beauce Broue',
 'Beautiful Brews Inc.',
 'Beaver Beer Company',
 'Beaver Brewing Company',
 'Beaver Street Brewery & Whistle Stop Cafe',
 'Beba',
 'Beck Bräu OHG',
 'Becker Brewing Company',
 'Bedele Brewery',
 'Bedondaine & Bedons Ronds',
 'Bee Creek Brewery',
 'Beer Engine, The',
 'Beer Factory',
 'Beer Here',
 'Beer House',
 'Beer Lounge Microcervecería',
 'Beer Valley Brewing Co.',
 'Beer Works',
 'Beer Works Chitose',
 'Beer of Yerevan JSC',
 "Beermann's Beerwerks",
 'Befed',
 'Beijing Yanjing Beer Group Corporation',
 'Belco Beer House S/A',
 'Belfast Bay Brewing Co.',
 'Belgoobeer',
 'Belhaven Brewery Company Ltd.',
 'Belize Brewing Co., Ltd.',
 "Bell's Brewery, Inc.",
 'Bellarine Brewing Co.',
 'Belle Isle Restaurant & Brewery',
 'Bells Hotel and Brewery',
 'Belmont Brewing Company',
 'Beltaine Appennino Risorse',
 'Belvoir Brewery Ltd',
 'Ben Middlemiss Brewing',
 'Bend Brewing Company',
 'Benderskiy Pivovarinniy Zavod',
 'Benelux Brewpub & Café',
 'Bennexer Brauhaus',
 'Benny Brewing Company',
 'Bent River Brewing Company',
 'Beowulf Brewing Company',
 'Bere Azuga S.A.',
 'Bere Craiova (Brau-Union Romania)',
 'Bere Trei Stejari',
 'Berentsens Brygghus',
 'Berg Brauerei Ulrich Zimmermann GmbH & Co. KG',
 'Berghoeve Brouwerij',
 'Bergquell Brauerei Löbau',
 'Berkeley Alehouse (Pyramid Breweries)',
 'Berkshire Brewing Company Inc.',
 'Berliner Bürgerbräu GmbH',
 'Berliner Kindl Brauerei',
 'Berliner-Schultheiss-Brauerei GmbH',
 'Bernard Family Brewery',
 'Berrow Brewery',
 'Berwick Brewing Company',
 'Bethlehem Brew Works',
 'Better Bitters Brewing Co.',
 'Beverly Hills Beerhouse Company',
 'Bewdley Brewery Co.',
 'Beyerd Café-Restaurant-Brouwerij De Beyerd',
 'Bhutan Brewery PVT Ltd',
 'Bi-Du',
 'Bicu Genova',
 'Bierbrier Brewing Inc',
 "Bierbrouwerij 't Vølendam",
 'Bierbrouwerij De Boei',
 'Bierbrouwerij De Drie Ringen',
 'Bierbrouwerij De Halve Maan',
 'Bierbrouwerij De Keyzer',
 'Bierbrouwerij De Koningshoeven B.V.',
 'Bierbrouwerij De Leeuw B.V.',
 'Bierbrouwerij De Schans',
 'Bierbrouwerij Grand-Café  Emelisse',
 'Bierbrouwerij Maallust',
 'Bierbrouwerij Moerenburg',
 'Bierbrouwerij Mommeriete',
 'Bierbrouwerij Onder De Linden',
 'Bierbrouwerij Openluchtmuseum Arnhem',
 'Bierbrouwerij Sint Christoffel B.V',
 'Bierbrouwerij Sint-Servattumus',
 'Bierbrouwerij Volendam',
 'Bierhalle',
 'Bierland',
 'Bierwerk Afrikan Ales Brewery',
 'Big Al Brewing',
 'Big Bay Brewing Co.',
 'Big Bear Brewing Company',
 'Big Beaver Brewing Co.',
 'Big Boss Brewing',
 'Big Buck Brewery & Steakhouse #1',
 'Big Buck Brewery & Steakhouse #3',
 'Big City Brewing CO LTD',
 "Big Daddy's",
 "Big Dog's Draft House",
 'Big Easy Brewing Company',
 'Big Hole Brewing Company',
 'Big Horse Brew Pub / Horse Feathers & Co.',
 'Big Hurt Brewing Company',
 'Big Lamp Brewers',
 'Big Muddy Brewing',
 'Big Rapids Brewing Co.',
 'Big Ridge Brewing Co.',
 'Big River Brewpub',
 'Big River Grille & Brewing Works',
 'Big Rock Brewery',
 'Big Rock Chop & Brewhouse',
 'Big Sky Brewing Company',
 'Big Stick Brew Company',
 'Big Surf Beer Co.',
 'Big Tide Brewing Co.',
 'Big Time Brewery & Alehouse',
 'Big Woods Brewing Co.',
 'Biha&#263;ka Pivovara',
 "Bilbo's Pizza",
 "Bill's Tavern & Brewhouse",
 'Billabong Brewery',
 'Binding-Brauerei AG',
 'Binghams Brewery',
 'Bintang',
 'Bintara Brewery',
 'Birdsong Brewing',
 'Birdsview Brewing Company',
 'Birkenhead Brewery',
 'BirrOne',
 'Birra Amiata',
 'Birra Cascinazza',
 'Birra Del Borgo',
 'Birra Korça',
 'Birra Menabrea SpA',
 'Birra Moretti (Heineken)',
 'Birra Peja',
 'Birra Peroni Industriale S.p.A.',
 'Birra Tenute Collesi SRL',
 'Birra Tirana',
 'Birra Toccalmatto',
 'Birra Venezia',
 'Birreria Blues',
 'Birreria Pedavena',
 'Birreria San Gabriel',
 'Birrificio Artigianale Aleghe',
 'Birrificio Artigianale Gilac',
 'Birrificio Artigianale Zimella',
 'Birrificio Artigiano - Orzo Bruno',
 'Birrificio B.O.A. (Birrificio Ostiense Artigianale)',
 'Birrificio Bacherotti',
 'Birrificio Barchessa Di Villa Pola',
 'Birrificio Barley',
 'Birrificio Baüscia',
 'Birrificio Bresciano Montenetto',
 'Birrificio Brùton',
 'Birrificio Chevalier',
 'Birrificio Civale',
 'Birrificio Dada',
 'Birrificio Degli Archi',
 'Birrificio Del Ducato',
 "Birrificio Del Levante - Leone D'Oro",
 'Birrificio Di Como',
 'Birrificio Doppio Malto',
 'Birrificio Emiliano',
 'Birrificio F.lli Trami',
 'Birrificio Freccia',
 'Birrificio Geco',
 'Birrificio Hibu',
 'Birrificio Indipendente Elav',
 'Birrificio Italiano',
 "Birrificio L'Olmaia",
 "Birrificio L'Orso Verde",
 'Birrificio Lambrate',
 'Birrificio Lariano',
 'Birrificio Le Baladin',
 'Birrificio Lodigiano',
 'Birrificio Menaresta',
 'Birrificio Montegioco',
 'Birrificio Ostiense Artigianale (BOA)',
 'Birrificio Pausa Cafe',
 'Birrificio Rurale',
 'Birrificio San Paolo',
 'Birrificio Sguaraunda',
 'Birrificio Spluga',
 'Birrificio Ticinese',
 'Birrificio Troll',
 'Birrificio Valcavallina',
 'Birrificio Valscura',
 'Birrificio\xa0Cittavecchia',
 'Birrifico Artigianale Sammarinese Srl',
 'Bir\x9e&#371; Alus',
 'Bison Brewing',
 'Bistro Victoria',
 'Bitburger Brauerei',
 'Bitter End Brewery and Bistro',
 'Bitter End Brewing Co.',
 'Bitter Root Brewing',
 'Biyat H.J.',
 'Bières 23, Brasserie Artisanale',
 'Bières De Carcassonne',
 'Bières De Saint-Louis',
 'Bières de Chimay S.A.',
 'Bières de Ré',
 'Black Country Ales',
 'Black Creek Historic Brewery',
 'Black Diamond Brewing Co.',
 'Black Forest Brew Haus & Grill',
 'Black Fox Brewing',
 'Black Heart Brewery',
 'Black Isle Brewery Co Ltd',
 'Black Lotus Brewery',
 'Black Market Brewing Co.',
 'Black Oak Brewing Co.',
 'Black Raven Brewing Co.',
 'Black Rock Brewing Co.',
 'Black Rooster Brewery',
 'Black Sheep Brewery PLC',
 'Black Star Co-op Pub & Brewery',
 'Black Toad Brewing Co.',
 'Black Tooth Brewing Company',
 'Blackdown Brewery',
 'Blackfoot River Brewing Company',
 'Blacksburg Brewing Company',
 'Blacksmith Brewing',
 'Blackstone Restaurant & Brewery',
 'Blatant Beer',
 'Blind Lady Ale House & Automatic Brewing Co.',
 'Blindmans Brewery',
 'Blitz-Weinhard Brewing Co.',
 'Block 15 Brewery & Restaurant',
 "Bloomington Brewing Company / Lennie's",
 'BluCreek Brewing',
 'Blue & Gray Brewing Co.',
 'Blue Anchor',
 'Blue Corn Brewery',
 'Blue Corn Cafe and Brewery',
 'Blue Frog Grog & Grill',
 'Blue Heron BrewPub',
 'Blue Heron Brewing Company',
 'Blue Hills Brewery',
 'Blue House Mediterranean Cafe',
 'Blue Monkey Brewery',
 'Blue Pants Brewery LLC',
 'Blue Point Brewing Company',
 'Blue Tractor BBQ & Brewery',
 'Bluegrass Brewing Co.',
 "Bluegrass Brewing Co. - East St. Matthew's",
 'Bluetongue Brewery Pty Ltd',
 'Boak Beverage',
 'Boat Brewery / The Boat',
 'Boathouse Brewpub & Restaurant',
 'Boatrocker Brewing Company',
 'Boddingtons',
 'Bodega Artesana',
 'Boggart Hole Clough Brewing Co.',
 'Bogota Beer Company',
 'Bohemian Brewery',
 'Bolets Petràs SCP',
 'Boliarka-VT',
 'Bolten Brauerei',
 'Bonaventure Brewing',
 'Bondi Beer Company',
 'Boneyard Beer Company',
 'Bonfire Bistro & Brewery',
 'Bonfire Brewing Co.',
 'Bonnema Brewing Company',
 "Bonzzini's Brew Pub",
 'Boon Rawd Brewery Co., Ltd',
 'Bootleg Brewery',
 "Bootlegger's Brewery",
 'Boquébière',
 'Borealis Brewery',
 'Borg Brugghús',
 'Borinquen Grill & Brewing Company',
 'Borodino ZAO',
 'Borsodi Brewery',
 'Boscos Restaurant & Brewing Co. - Little Rock',
 'Boscos Restaurant & Brewing Co. - Nashville',
 'Boston Baked Beans Co.',
 'Boston Beer Company (Samuel Adams)',
 'Boston Breweries',
 'BottleTree Beer Co.',
 'Boulder Beer / Wilderness Pub',
 'Boulder Creek Brewing',
 'Boulevard Brewing Co.',
 'Bowen Island Brewing Co. Ltd.',
 'Bowland Beer Company Limited',
 'Bowman Ales Ltd.',
 'Bowral Brewing Company',
 'Box Steam Brewery',
 'Boxcar Brewing Co.',
 'Boyne River Brewing Company',
 'Boyntons Brewing Company',
 'Bozeman Brewing Company',
 "Br'hant",
 'Br. De Wijde Wereld / Leidsch Bier',
 'Bradfield Brewery',
 'Bragdy Ynys Môn',
 'Braidwood Brewery',
 'Brand Bierbrouwerij',
 'Brasal-Brasserie Allemande',
 'Brasil Kirin',
 'Brass Monkey Brewing Co.',
 'Brasserie Alauna',
 'Brasserie Almaza S.A.L.',
 'Brasserie Alphand',
 "Brasserie An Alarc'h",
 'Brasserie Artisanale Christophe Noyon',
 "Brasserie Artisanale De L'Ermitage",
 'Brasserie Artisanale De La Crécelle',
 'Brasserie Artisanale De La Divatte',
 'Brasserie Artisanale De Rullés SPRL',
 'Brasserie Artisanale Des Grands Cols',
 'Brasserie Artisanale Du Dauphiné',
 'Brasserie Artisanale Du Flo',
 'Brasserie Artisanale Du Trégor',
 'Brasserie Artisanale La Frasnoise',
 'Brasserie Artisanale Millevertus De Toernich',
 'Brasserie Artisanale Océane',
 'Brasserie Artisanale Ratz',
 'Brasserie Artisanale de Redange',
 'Brasserie Artisanale des Remparts',
 'Brasserie Artisanale des Vosges',
 'Brasserie Artisanale du Centre',
 'Brasserie Augrenoise',
 'Brasserie Authentique',
 'Brasserie BVM Inc.',
 'Brasserie Bailleux',
 'Brasserie Basabürüa',
 'Brasserie Battin',
 'Brasserie Belgh Brasse',
 'Brasserie Belle-Vue',
 'Brasserie Bièropholie',
 'Brasserie Bofferding',
 'Brasserie Bourganel',
 'Brasserie Britt de Bretagne',
 'Brasserie Brootcoorens',
 'Brasserie Cantillon',
 'Brasserie Caracole',
 'Brasserie Cardinal',
 'Brasserie Castelain',
 'Brasserie Caulier',
 'Brasserie Celtik',
 'Brasserie Celtique Du Pen Roc',
 'Brasserie Cooperative Tri Martolod',
 'Brasserie Coreff',
 'Brasserie Cornelyshaff á Heinescheid',
 'Brasserie Craig Allan',
 "Brasserie D'Annoeulin",
 "Brasserie D'Ebly",
 "Brasserie D'Ecaussinnes",
 "Brasserie D'Olt",
 'Brasserie De Bel Air',
 'Brasserie De Bellefois',
 'Brasserie De Bouillon / Le Marché De Nathalie',
 'Brasserie De Brunehaut',
 'Brasserie De Cazeau',
 'Brasserie De Clerck',
 'Brasserie De La Baie',
 'Brasserie De La Brière',
 'Brasserie De La Chapelle',
 'Brasserie De La Cote De Jade',
 'Brasserie De La Creuse',
 'Brasserie De La Pigeonnelle',
 'Brasserie De La Soif',
 'Brasserie De La Vallée De Chevreuse',
 'Brasserie De Lautène',
 'Brasserie De L\x92Abbaye Du Cateau',
 'Brasserie De Milly',
 'Brasserie De Monaco',
 'Brasserie De Mont-Saint-Guibert',
 'Brasserie De Pouldreuzic',
 'Brasserie De Saint Omer',
 'Brasserie De Saint-Sylvestre',
 "Brasserie De Silenrieux S.A. (Chez L'Père Sarrasin)",
 'Brasserie Des Abers',
 'Brasserie Des Alberes',
 'Brasserie Des Cimes',
 'Brasserie Des Fagnes',
 'Brasserie Des Gabariers (L&L Cognac SA)',
 'Brasserie Des Garrigues',
 'Brasserie Des Géants',
 'Brasserie Des Monts',
 'Brasserie Des Rateliers',
 'Brasserie Des Trois Fontaines',
 'Brasserie Des Trois Frontieres',
 'Brasserie Des Vignes',
 'Brasserie Deseveaux',
 'Brasserie Dieu Du Ciel',
 'Brasserie Distillerie Du Mont Blanc',
 'Brasserie Du Bocq',
 'Brasserie Du Brabant',
 'Brasserie Du Canardou',
 'Brasserie Du Chant Du Loup',
 'Brasserie Du Corsaire',
 'Brasserie Du Loch',
 'Brasserie Du Mont Salève',
 'Brasserie Du Sollier',
 'Brasserie Dubuisson Frères sprl',
 'Brasserie Dunham',
 'Brasserie Dupont sprl',
 'Brasserie Dutailly',
 'Brasserie Duyck',
 'Brasserie Ellezelloise',
 "Brasserie Etxeko Bob's Beer",
 'Brasserie Fantôme',
 "Brasserie Fischer / Fischer Biere d' Alsace",
 'Brasserie Gigi',
 "Brasserie Grain D'Orge",
 "Brasserie Grain d' Orge (Brasserie Jeanne d'Arc SA)",
 'Brasserie Haldemann',
 'Brasserie Heineken',
 'Brasserie Henri Monier',
 'Brasserie Henry',
 'Brasserie Hollbeer et Magasin',
 "Brasserie Kerav'ale",
 'Brasserie Kohler Rehm',
 'Brasserie La Barbiot',
 'Brasserie La Bierataise',
 'Brasserie La Binchoise (Malterie des Remparts)',
 'Brasserie La Choulette',
 'Brasserie La Morgat',
 'Brasserie La Rochefortoise',
 'Brasserie La Saint-Pierre',
 'Brasserie La Tongrinnoise',
 'Brasserie Lacu',
 'Brasserie Lancelot',
 'Brasserie Larché',
 'Brasserie Lauth',
 'Brasserie Le Gambrinus',
 'Brasserie Lebbe',
 'Brasserie Lefèbvre SA',
 'Brasserie Les Trois Brasseurs',
 'Brasserie Les Trois Fourquets',
 'Brasserie Lorraine S.A.',
 'Brasserie Mare Nostrum',
 'Brasserie Meteor',
 'Brasserie Michard',
 'Brasserie Morbraz',
 "Brasserie Mortal's Beers",
 "Brasserie Moulin D'Ascq",
 'Brasserie Mélusine',
 'Brasserie Nationale (Bofferding)',
 "Brasserie Nationale d'Haiti",
 "Brasserie Oc'Ale",
 'Brasserie Oxymore',
 'Brasserie Phylosophale',
 'Brasserie Piedboeuf',
 'Brasserie Pietra',
 'Brasserie Rancho El Paso',
 'Brasserie Rouget De Lisle',
 'Brasserie Saint Antoine Abbé',
 'Brasserie Saint Donat',
 'Brasserie Saint-Germain',
 'Brasserie Saint-Monon',
 'Brasserie Sainte-Colombe',
 'Brasserie Salaisons Hotel - BeierHaascht',
 'Brasserie Sancerroise',
 'Brasserie Schutzenberger',
 'Brasserie Seigneuriale (Sleeman Breweries Ltd.)',
 'Brasserie Simon',
 'Brasserie St. Feuillien',
 'Brasserie Tchequébec',
 'Brasserie Theillier',
 'Brasserie Thiriez',
 'Brasserie Tonnerre De Brest',
 'Brasserie Trois Dames',
 'Brasserie Uberach',
 'Brasserie Val De Sambre',
 'Brasserie Vanuxeem',
 'Brasserie Vervifontaine',
 'Brasserie Warenghem',
 'Brasserie Zannekin',
 "Brasserie d'Achouffe",
 "Brasserie d'Algiers",
 "Brasserie d'Annaba",
 "Brasserie d'Oc",
 "Brasserie d'Oran",
 "Brasserie d'Orgemont",
 "Brasserie d'Orval S.A.",
 'Brasserie de Bellevaux',
 'Brasserie de Blaugies',
 'Brasserie de Jandrain-Jandrenouille',
 'Brasserie de Luxembourg Mousel -Diekirch SA',
 'Brasserie de Rochefort',
 'Brasserie de Saverne',
 'Brasserie de Silly S.A.',
 'Brasserie de Skida',
 'Brasserie de Tahiti S.A.',
 "Brasserie de l'Abbaye des Rocs s.a.",
 "Brasserie de l'Abbaye du Val-Dieu",
 'Brasserie de la Senne',
 'Brasserie des Fontaines',
 'Brasserie des Sources',
 'Brasserie du Bouffay',
 'Brasserie du Boxer SA',
 'Brasserie du Chardon',
 'Brasserie la Botteresse de Sur-les-Bois',
 'Brasserie la Cornue',
 'Brasserie à Vapeur',
 'Brasseries De Bourbon',
 'Brasseries Du Maroc',
 'Brasseries Kronenbourg',
 'Brasseries Oldarki',
 'Brasseries Star Madagascar',
 "Brasseurs De L'Anse",
 'Brasseurs De La Mauricie',
 'Brasseurs De Montréal',
 'Brasseurs Du Monde',
 'Brasseurs Du Nord Inc.',
 'Brasseurs Illimités',
 'Brasseurs R.J.',
 'Brasseurs Sans Gluten',
 'Brau Brothers Brewing Co. LLC',
 'Brau Union Hungária Sörgyárak / Soprony Brewery',
 'Brau Union Österreich AG',
 'Braucommune in Freistadt',
 'Brauer-Vereinigung Pegnitz',
 'Brauerei Aare Bier',
 'Brauerei Albert Egger AG',
 'Brauerei Aldersbach',
 'Brauerei Allersheim GmbH',
 'Brauerei Alsfeld AG',
 'Brauerei Alwin Märkl',
 'Brauerei An Der Thomaskirche',
 'Brauerei August Gleumes GmbH',
 'Brauerei Baar AG',
 'Brauerei Baumer',
 'Brauerei Bayer Theinheim',
 'Brauerei Beck & Co.',
 'Brauerei BierVision Monstein AG',
 'Brauerei Bischofshof',
 'Brauerei Blauer Löwe',
 'Brauerei Brinkhoff GmbH',
 'Brauerei Bruckmüller',
 'Brauerei C. & A. Veltins GmbH & Co.',
 'Brauerei Clemens Härle',
 'Brauerei Diebels GmbH & Co KG',
 'Brauerei Eichhof',
 'Brauerei Eichhorn',
 'Brauerei Erharting',
 'Brauerei Eschenbräu',
 'Brauerei Eugen Häberlen',
 'Brauerei Falken AG',
 'Brauerei Felsenau AG',
 'Brauerei Fischerstube AG',
 'Brauerei Fohrenburg GmbH & Co',
 'Brauerei Forst AG',
 'Brauerei Franz Xaver Glossner',
 'Brauerei Frastanz',
 'Brauerei Frastanzer Genossenschaft M.b.H.',
 'Brauerei Friedel',
 'Brauerei Friedr. Jacob',
 'Brauerei Fritz Gutmann',
 'Brauerei Früh Am Dom',
 'Brauerei Fässla',
 'Brauerei G.A. Bruch',
 'Brauerei Ganter GmbH & Co. KG',
 'Brauerei Gasthof Hartmann',
 'Brauerei Gasthof Hotel Post',
 'Brauerei Gasthof Kundmüller',
 'Brauerei Gasthof Pfister GmbH',
 'Brauerei Gasthof Rittmayer',
 'Brauerei Gasthof Schwan - Schwanenbräu',
 'Brauerei Gebrüder Maisel GmbH & Co.',
 'Brauerei Gebrüder Röhrl',
 'Brauerei Georg Meister',
 'Brauerei Georg Michael GmbH',
 'Brauerei Gilde',
 'Brauerei Gold Ochsen GmbH',
 'Brauerei Goss',
 'Brauerei Gotha GmbH',
 'Brauerei Grandauer - Wildbräu Grafing vor München',
 'Brauerei Greifenklau',
 'Brauerei Grieskirchen AG',
 'Brauerei Grohe GmbH',
 'Brauerei Gundel',
 'Brauerei Gusswerk GmbH',
 'Brauerei Göller Zur Alten Freyung',
 'Brauerei Göss (Brau Union)',
 'Brauerei H. Müller AG',
 'Brauerei Haag',
 'Brauerei Hacklberg',
 'Brauerei Hainfeld Karl Riedmüller GmbH & Co. KG',
 'Brauerei Hans Hennemann',
 'Brauerei Hauf GmbH',
 'Brauerei Hebendanz',
 'Brauerei Heinrich Reissdorf',
 'Brauerei Heller GmbH',
 'Brauerei Heller-Trum / Schlenkerla',
 'Brauerei Herrenhausen KG',
 'Brauerei Hirt GmbH',
 'Brauerei Hofmann Hohenschwärz',
 'Brauerei Hofstetten Krammer GmbH & Co. KG',
 'Brauerei Im Füchschen',
 'Brauerei Johann Kneitinger',
 'Brauerei Johannes Grasser',
 'Brauerei Josef Greif',
 'Brauerei Josef Schimpfle',
 'Brauerei Josef Schweighart',
 'Brauerei Julius Hummel',
 'Brauerei Kapsreiter GmbH',
 'Brauerei Karg',
 'Brauerei Karl Strauss',
 'Brauerei Keesmann',
 'Brauerei Klaus Rauh GmbH',
 'Brauerei Konrad Will',
 'Brauerei Kraus',
 'Brauerei Königshof',
 'Brauerei Kühbach',
 'Brauerei Ladenburger GmbH',
 'Brauerei Lasser GmbH',
 'Brauerei Leikeim',
 'Brauerei Leonhard Schübel',
 'Brauerei Locher AG',
 'Brauerei Ludwig Erl',
 'Brauerei Maisach - Bräustüberl Maisach',
 'Brauerei Max Leibinger GmbH',
 'Brauerei Max Wolf GmbH',
 'Brauerei Michael Plank',
 'Brauerei Mittenwald',
 'Brauerei Mälzerei Otto Kienberger',
 'Brauerei Neder',
 'Brauerei Norbert Fischer',
 'Brauerei Ott',
 'Brauerei Otto Hübner',
 'Brauerei Penning Zeißler',
 'Brauerei Peter Griess',
 'Brauerei Pinkus Mueller',
 'Brauerei Puntigam (Brau Union)',
 'Brauerei Raschhofer',
 'Brauerei Reichold',
 'Brauerei Richard Wagner',
 'Brauerei Ried',
 'Brauerei Rosengarten AG',
 'Brauerei Rothmoos Anton Kirnberger',
 'Brauerei S. Riegele',
 'Brauerei Sauer',
 'Brauerei Schlappeseppel GmbH',
 'Brauerei Schleswig',
 'Brauerei Schloss Eggenberg',
 'Brauerei Schloss Starkenberg',
 'Brauerei Schlösser GmbH',
 'Brauerei Schrems',
 'Brauerei Schumacher',
 'Brauerei Schwarzach Walter Seeber',
 'Brauerei Schwechat (Brau Union)',
 'Brauerei Schäffler',
 'Brauerei Schützengarten AG',
 'Brauerei Sester GMBH',
 'Brauerei Sigwart KG',
 'Brauerei Spezial',
 'Brauerei Stadtguet',
 'Brauerei Steigerwald - Familie Hike',
 'Brauerei Stempfhuber',
 'Brauerei Stolz GmbH & Co. KG',
 'Brauerei Stöttner',
 'Brauerei Tucher Brau',
 'Brauerei Ustersbach',
 'Brauerei Vasold & Schmitt',
 'Brauerei Vormann',
 'Brauerei Weitnau',
 'Brauerei Weitra',
 'Brauerei Westheim GmbH',
 'Brauerei Wieselburg (Brau Union)',
 'Brauerei Winkler',
 'Brauerei Wittmann',
 'Brauerei Witzgall',
 'Brauerei Wolfshöhe K. u. F. Weber GmbH & Co. KG',
 'Brauerei Zehendner GmbH',
 'Brauerei Zipf (Brau Union)',
 'Brauerei Zirndorf GmbH',
 'Brauerei Zoller-Hof',
 'Brauerei Zur Sonne',
 'Brauerei Zwettl Karl Schwarz GmbH',
 'Brauerei zum Schwarzen Adler',
 ...]
In [19]:
df_names1 = pd.DataFrame(sorted(c), columns=['names_1'])
df_names1.to_csv('names1.csv', index=False)
In [20]:
sorted(d)
Out[20]:
['(405) Brewing Co',
 '(512) Brewing Co',
 '10 Barrel Brewing Co',
 '10 Barrel Brewing Co - Bend Pub',
 '10 Barrel Brewing Co - Denver',
 '10 Torr Distilling and Brewing',
 '10-56 Brewing Company',
 '101 Brewery',
 '101 North Brewing Company',
 '105 West Brewing Co',
 '10K Brewing',
 '10th District Brewing Company',
 '11 Below Brewing Company',
 '1188 Brewing Co',
 '12 Gates Brewing Company',
 '12 West Brewing Company',
 '12 West Brewing Company - Production Facility',
 '127 Brewing',
 '12Degree Brewing',
 '12welve Eyes Brewing',
 '13 Below Brewery',
 '13 Stripes Brewery',
 '13 Virtues Brewing Co',
 '1323 R & D',
 '14 Cannons Brewing Company',
 '14 Lakes Brewery',
 '1487 Brewery',
 '14er Brewing Company',
 '14th Star Brewing',
 '16 Lots Brewing',
 '16 Mile Brewing Co',
 '16 Stone Brewpub',
 '1623 Brewing CO, llc',
 '1718 Ocracoke Brewing',
 '1781 Brewing Company',
 '17th State Brewing Company',
 '180 and Tapped',
 '1817 Brewery',
 '1840 Brewing Company',
 '1850 Brewing Company',
 '18th Street Brewery',
 '1905 Brewing Company',
 '1912 Brewing',
 '192 Brewing',
 "1940's Brewing Company",
 '1st Republic Brewing Co',
 '2 Basset Brewery',
 '2 Dogz and A Guy Brewing',
 '2 Feet Brewing Company',
 '2 Row Brewing',
 '2 Silos Brewing Company',
 '2 Tones Brewing Co.',
 '2 Tread Brewing Co',
 '2 Way Brewing Company',
 '2 Witches Winery and Brewing Company',
 '20 Corners Brewing LLC',
 '210 Brewing Co',
 '212 Brewing Company',
 '217 Brew Works',
 '21st Amendment Brewery Cafe',
 "23 Brewing Company / Lizzie B's Cafe",
 '238 Brewing Company',
 '26 Degree Brewing Company',
 '28th State Brewing',
 '2C Family Brewing Co.',
 '2Kids Brewing Company',
 '2SP Brewing Company',
 '2Toms Brewing Company',
 '2nd Shift Brewing Co',
 '2nd Story Brewing Company',
 '3 Beards Beer Company',
 '3 Daughters Brewing',
 '3 Disciples Brewing',
 '3 Freaks Brewing Co',
 '3 Iron Brewing Company',
 '3 Keys Brewing Company, LLC',
 '3 Nations Brewing',
 '3 Sheeps Brewing Co',
 '3 Sheets Brewery',
 '3 Sons Brewing Co.',
 '3 Stars Brewing Co',
 '30 Mile Brewing Co.',
 '300 Suns Brewing Company',
 '32 North Brewing Co',
 '34 Degree North Experiment Station',
 '350 Brewing Co',
 '38 State Brewing',
 '380 Brewing Company',
 '38°-75° Brewing',
 '3cross Fermentation Cooperative',
 '3rd Degree Brewhouse',
 '3rd Planet Brewing',
 '3rd Rock Brewing Company',
 '3rd Turn Brewing',
 '3rd Wave Brewing Co',
 '4 By 4 Brewing Company',
 '4 Hands Brewing Co',
 '4 Noses Brewing Company',
 '400 North Brewing',
 '406 Brewing Company',
 '411 Broadway Ales',
 '412 Brews',
 '42 North Brewing Company',
 '4204 Main Street Brewing Co',
 '450 North Brewing Company @ Simmons Winery',
 '47 Hills Brewing Co',
 '49th State Brewing Co',
 '49th State Brewing Co - Anchorage',
 "4B's Brewery",
 '4kd Crick Brewery',
 '4th Tap Brewing Cooperative',
 '5 Alarm Brewing Co',
 '5 Lakes Brewing Co.',
 '5 Rabbit Cerveceria Inc',
 '5 Rights Brewing Co',
 '5 Rivers Brewing LLC',
 '5 Seasons Brewing Co - Prado',
 '5 Seasons Brewing Co - Westside',
 '5 Stones Artisan Brewery',
 '515 Brewing Co',
 '5150 Brewing At The Brass Tap Rocklin',
 '51st State Brewing Company',
 '51st Ward Beer Company',
 '54-40 Brewing Company',
 '550 Brewing',
 '559 Local Brewing',
 '56 Brewing',
 '5770 Brewery',
 '5th Element Brewing',
 '5x5 Brewing Co.',
 '6 & 40 Brewery/Taproom',
 '6 Bears & A Goat Brewing Company, LLC',
 '6 Bridges Brewing',
 '6 Degrees of Separation',
 '603 Brewery',
 '608 Brewing Company',
 '612 Brew',
 '6th and La Brea',
 '7 Devils Brewing Co',
 '7 Hermits Brewing Company',
 '7 Hills Brewing Company',
 '7 Locks Brewing',
 '7 Mile Brewery',
 '7 Seas Brewing Co',
 '7 Sins Brewery',
 '7 Sisters Brewing Co',
 '718BrewCafe, LLP',
 '734 Brewing Company',
 '7th Settlement Brewery',
 '7th Wave Brewing',
 '7venth Sun Brewery',
 '7venth Sun Tampa',
 '8 Bit Brewing Company',
 '8-Bit Aleworks',
 '81Bay Brewing Company',
 '841 Brewhouse',
 '8one8 Brewing Company',
 '8th Street Ale Haus',
 '8th Ward Brewing Company',
 '8th Wonder Brewery',
 '903 Brewers',
 '927 Beer Company',
 '95ate5 Brewpub',
 '9th Hour Brewing Company LLC',
 'A Little Madness Brewing Company',
 'ALT Brew / Greenview Brewing LLC',
 'Aardwolf Brewing Company',
 'Abandon Brewing',
 'Abandoned Building Brewery',
 'Abbey Brewing Co',
 'Abbey Brewing Company, LLC',
 'Aberrant Ales',
 'Abide Brewing Company',
 'Abigaile',
 'Abita Brewing Co',
 'Abjuration Brewing',
 'Able Baker Brewing',
 'Able Ebenezer Brewing Company',
 'Able Seedhouse and Brewery',
 "Abner's Restaurant Inc / Northern Michigan Beer",
 'Abnormal Beer Company',
 'Abolitionist Ale Works',
 'Abridged Beer Company',
 'Absolution Brewing Co',
 'Absolution By the Sea',
 'Accomplice Beer Company',
 'Acidulous Brewing Company',
 'Acopon Brewing Co',
 'Acoustic Ales Brewing Experiment',
 'Actual Brewing Company, LLC',
 'AdMerk Corp. Inc.',
 "Adam's Northwest Bistro / Twin Rivers Brewing",
 "Adelbert's Brewery LLC",
 'Adirondack Pub and Brewery',
 'Adirondack Toboggan Company Microbrewery',
 'Adobe Creek Brewing Company',
 'Adroit Theory Brewing Company',
 'Adventure Brewing Co',
 'Adventure South',
 'Aero Craft Brewing',
 'Aero Plains Brewing',
 'Aeronaut Brewing Company',
 'Aftershock Brewing Co',
 'Afterthought Brewing Company',
 'Against the Grain Beer Garden',
 'Against the Grain Brewery',
 'Against the Grain Production Facility',
 'Agonic Brewing Company LLC',
 'Agrarian Ales, LLC',
 'Ahnapee Brewery',
 'Aigean Ales',
 'Aiken Brewing Co',
 'Airline Brewing Company',
 'Airways Brewing Co',
 'Akademia Brewing Company',
 'Akasha Brewing Company',
 'Akronym Brewing LLC',
 'Akwesasne Mohawk Casino Resort',
 'Alameda Brewing Co',
 'Alameda Island Brewing Company',
 'Alamo Beer Co',
 'Alarmist Brewing Co',
 'Alaro Craft Brewery',
 'Albia Brewing Co',
 'Albion Malleable Brewing Company',
 'Albright Grove Brewing Company',
 'Alcatraz Brewing',
 'Alcazar Brewery, LLC.',
 'Alchemist Cannery',
 'Aldus Brewing Company',
 'Ale House Brewing Co',
 'Ale Republic',
 'Ale Spike',
 'AleSmith Brewing Co',
 'Aleman Brewing',
 'Alematic Artisan Ales',
 'Alesatian Brewing Co.',
 'Alesong Brewing and Blending',
 'Alesong Brewing and Tasting Room',
 'Alewerks Brewing Company',
 'Alewife Brewing Company',
 'Alexandria Brewing Company',
 'Alibi Ale Works',
 'Alien Brewpub',
 'Align Brewing Co',
 'All Rise Brewing Co',
 'All Saints Brewing Co',
 'All-American Ale Works',
 'Allagash Brewing Co',
 'Allegheny City Brewing',
 'Allegory Brewing',
 'Alliance Brewing Company',
 "Alligator Brewing / Tall Paul's Brewhouse",
 'Alloy Brewing Company',
 "Alltech's Lexington Brewing",
 'Alluvial Brewing Company',
 'Alma Brewing Co',
 'Almanac Beer Company',
 'Aloha Beer Co',
 'Alosta Brewing Co',
 'Alpenglow Beer Co',
 'Alpha Acid Brewing Co',
 'Alpha Brewing Co.',
 'Alphabet City Brewing Co',
 'Alpine Brewing Co',
 'Alpine Dog Brewing Co',
 'Alta Brewing Company',
 'Altamont Beer Works',
 'Alter Brewing Company',
 'Altered Code Brewing',
 'Alternation Brewing Company',
 'Altitude Chophouse and Brewery',
 'Altmeyer and Lewis Brewing Company',
 'Altruist Brewing Company',
 'Altstadt Brewery',
 'Alvarado Street Brewery',
 'Alvarado Street Brewery & Grill',
 'Alvarium Beer Company',
 'Amador Brewing Company',
 'Amalgam Brewing',
 'Amazon Beer',
 'Amber Lantern Brewing Company',
 'Ambitious Ales',
 'Amelia Island Brewing Company',
 'American Badass Beer',
 'American Brewers Inc',
 'American Harvest Brewpub At Schoolcraft College',
 'American Honor Beer Co',
 'American Icon Brewery',
 'American Solera',
 'Americana Brewery',
 'Amerisports Brew Pub',
 'Amherst Brewing Co / Hangar Pub and Grill',
 'Ammo Brewing',
 'Amorys Tomb Brewing Co',
 'Amplified Ale Works',
 'Amplified Ale Works Miramar Studio',
 'Anacapa Brewing Co',
 'Anacortes Brewery/Rockfish Grill',
 'Anawan Brewing Company',
 'Ancestry Brewery',
 'Anchor Brewing Co',
 'Anchorage Brewing Co',
 'Ancient City Brewing Co.',
 'Andean Brewing',
 'Anderson Valley Brewing Co',
 'Andrews Brewing Co',
 'Andrews Brewing Co At Calaboose Cellars',
 'Angel City Brewery',
 'Angelina Brewing Company, LLC.',
 'Angry Chair Brewing, LLC.',
 'Angry Erik Brewing',
 'Angry Fish Brewing Company',
 'Angry Horse Brewing',
 'Angry Inch Brewing',
 'Angry James Brewing Co',
 'Angry Minnow, The',
 'Angry Scotsman Brewing',
 'Angry Troll Brewing / 222 Public House',
 'Anheuser-Busch InBev',
 'Anheuser-Busch Inc - Fort Collins',
 'Anheuser-Busch Inc - Los Angeles',
 'Anheuser-Busch Inc â\x80\x93 Baldwinsville',
 'Anheuser-Busch Inc â\x80\x93 Cartersville',
 'Anheuser-Busch Inc â\x80\x93 Columbus',
 'Anheuser-Busch Inc â\x80\x93 Fairfield',
 'Anheuser-Busch Inc â\x80\x93 Houston',
 'Anheuser-Busch Inc â\x80\x93 Jacksonville',
 'Anheuser-Busch Inc â\x80\x93 Merrimack',
 'Anheuser-Busch Inc â\x80\x93 Newark',
 'Anheuser-Busch Inc â\x80\x93 Williamsburg',
 'Animas Brewing Co',
 'Ankrolab Brewing Company',
 'Anthem Brewing Co',
 'Antietam Brewery,LLC',
 'Anvil Brewing',
 'Apex Brew And BBQ Supplies',
 'Apex Brewery',
 'Apocalypse Ale Works',
 'Apocalypse Brew Works',
 'Appalachian Brewing Co - Collegeville',
 'Appalachian Brewing Co - Gettysburg Gateway',
 'Appalachian Brewing Co - Harrisburg',
 'Appalachian Brewing Co - Lititz',
 'Appalachian Brewing Co - Mechanicsburg',
 'Appalachian Mountain Brewery - Boone',
 'Appalachian Mountain Brewery - Portsmouth',
 'Apple Blossom Brewing Co',
 'Appleton Beer Factory',
 'Apponaug Brewing',
 'Aquabrew',
 'Aquatic Brewing LLC',
 'Aqueduct Brewing',
 'Aransas Pass Brewing Company, LLC',
 'Arbeiter Brewing Company',
 'Arbor Brewing Co',
 'Arbor Brewing Co. Microbrewery',
 'Arcadia Brewing Co',
 'Arcadian Moon',
 'Arcana Brewing Company',
 'Arch Rock Brewing Co',
 'Archaic Craft Brewery At Centro',
 'Arches Brewing',
 'Archetype Brewing',
 'Arclight Brewing Company',
 'Arcpoint Brewing Co',
 'Ardent Craft Ales',
 'Ardent Spirits, Inc',
 "Argilla Brewing Co @ Pietro's Pizza",
 'Argyle Brewing Company',
 'Arizona Craft Brewing',
 'Arizona Wilderness Brewing',
 'Arkane Aleworks',
 'Arkose Brewery',
 'Arlington Club',
 'Armada Brewing',
 'Armistice Brewing Company',
 'Armory Beer Company',
 'Armstrong Brewing Co',
 'Around the Bend Beer Co.',
 'Arrow Lodge Brewing',
 'Arrowhead Ales Brewing Company',
 'Arrowood Farms',
 'Artesanales Cervezarte C.A.',
 'Artifex Brewing Company',
 'Artisan Owl Brewing Company',
 'Artisanal Brew Works',
 'Arts District Brewing Company',
 'Arundel Cellars & Brewing Co',
 'Arvon Brewing Co.',
 'Asbury Park Brewery',
 'Ascension Brewing Company',
 'Asgard Brewing Company',
 'Ashby Brewing Company',
 'Asher Brewing Co',
 'Asheville Brewing Co',
 'Ashton Brewing Company',
 'Ashtown Brewing Co',
 'Ashuelot Brewing Company',
 'Aslan Brewing Company',
 'Aslin Beer Company',
 'Aspen Brewing Company',
 'Aspetuck Brew Lab',
 'Ass Clown Brewing Co',
 'Assembly brewing',
 'Aston Abbey Brewing Co',
 'Astoria Brewing Company',
 'Astro Lab Brewing',
 'Astronomy Aleworks',
 'Asylum Brewing',
 'At Ease Brewing',
 'At Large Brewing',
 'Atco Brewing LLC',
 'Athens Brewing Co',
 'Athletic Brewing Company',
 'Atlantic Beach Brewing Company',
 'Atlantic Brewing Co',
 'Atlantic Brewing Co / Bar Harbor Brewing',
 'Atlas Brew Works',
 'Atom Brewing Company',
 'Atomic Ale Brewpub and Eatery',
 'Atrevida Beer Company',
 'Attaboy Beer',
 'Attic Brewing',
 'Atwater Brewing Co',
 'Atwater Grand Rapids',
 'Atwater In the Park Biergarten and Tap House',
 'Atwood Ales',
 'Auburn Alehouse',
 'Auburn Brewing Company',
 'Audacious Aleworks',
 'August Schell Brewing Co',
 'Augusta Brewing Co',
 'Augustino Brewing',
 'Aurochs Brewing Company',
 'Aurora Ale & Lager',
 'Ausable Brewing Co',
 "Austin Brothers' Beer Company",
 'Austin Street Brewery',
 'Automatic Brewing Co. / Blind Lady Alehouse',
 'Avalanche Brewing Co',
 'Avant Garde Beer',
 'Avery Brewing Co',
 'Avondale Brewing Co',
 'Axe And Arrow Brewery',
 'Axle Brewing Company',
 'B Chord Brewing Company',
 "B and J's Handcrafted Texas Ales",
 'B-52 Brewing',
 'B-CS  Zoigl Brewery',
 'B.C. Brewery',
 "B.O.B's Brewery",
 'B9 Beverages Inc',
 'BAD Brewing Co',
 'BADSONS Beer Co',
 'BAKFISH Brewing Company',
 'BBGB Brewery And Hop Farm',
 'BDD Brewing Company',
 "BJ's Brewhouse",
 'BJs Restaurant & Brewery - Boulder',
 'BJs Restaurant & Brewery - Brea',
 'BJs Restaurant & Brewery - Chandler',
 'BJs Restaurant & Brewery - Reno',
 'BJs Restaurant & Brewery - West Covina',
 'BKS Artisan Ales',
 'BLUE RIDGE BREWING',
 'BNS Brewing & Distilling Co.',
 'BRU Handbuilt Ales and Eats',
 'BS Brewing',
 'BW Farm, LLC',
 'Baa Baa Brewhouse',
 "Babe's Bar-B-Que and Brewhouse",
 'Bacchus Brewing',
 'Back Alley Brewing Co',
 'Back Bay Brewing Company',
 'Back Channel Brewing',
 'Back Creek Brewing Company',
 'Back East Brewing',
 'Back Forty Beer Co',
 'Back Pedal Brewing Company',
 'Back Pew Brewing',
 'Back Street Brewery & Tasting Room',
 'BackRoad Brewery',
 'BackStory Brewery',
 'Backacre Beermakers',
 'Backlash Beer Co',
 'Backman Brewing Company',
 'Backpocket Brewing Co',
 'Backroom Brewery',
 'Backshore Brewing Co.',
 'Backside Brewing Co',
 'Backslope Brewing',
 'Backstep Brewing Company',
 'Backswing Brewing Co.',
 'Backward Flag Brewing',
 'Backwoods Brewing Company',
 'Backwoods Brewing Company - Production Only',
 'Backyard Barn Winery and Microbrewery',
 'Backyard Brewery',
 'Bad Beat Brewing',
 'Bad Bulldogs Brewery',
 'Bad Dad Brewery',
 'Bad Habit Brewing Company',
 'Bad Idea Brewing',
 "Bad Jimmy's Brewing Co",
 'Bad Lab Beer Co.',
 "Bad Martha Farmer's Brewery",
 'Bad Shepherd Beer Company',
 'Bad Tom Smith Brewing',
 'Bad Water Brewing',
 'Bad Weather Brewing LLC',
 'BadWolf Brewing Company',
 'Badass Backyard Brewing',
 'Badger Hill Brewing',
 'Badger Mountain Brewing',
 'Badger State Brewing Company',
 'Baere Brewing Company',
 'Baerlic Brewing Co',
 'Baffin Brewing Co',
 'Bag and Kettle, The',
 'Bagby Beer Company',
 'Baileson Brewing Company, LLC',
 'Bainbridge Island Brewing',
 'Baithouse Brewery',
 'Baker City Brewing Co',
 'Bald Man Brewing Company',
 'Bald Top Brewing Company',
 'Bale Breaker Brewing Company',
 'Baleen Brewing Co.',
 'Ballad Brewing',
 'Ballast Point Brewing Co / Home Brew Mart',
 'Ballast Point Brewing Company - Little Italy',
 'Balsam Falls Brewing',
 'Balter Beerworks',
 'Baltimore Washington Beer Works',
 'Bam Entertainment Center',
 'Band Of Bohemia',
 'Band of Brothers Brewing Company',
 'Banded Brewing Company',
 'Banded Oak Brewing Company',
 'Bandera Ale Project, LLC',
 'Bandit Brewing Co',
 'Bandon Brewing Company',
 'Bandwagon Brewery',
 'Bang Brewing',
 'Bang the Drum Brewery',
 'Banger Brewing Co',
 "Bangin' Banjo Brewing Company",
 'Bangor Beer Co.',
 'Bangor Trust Brewing',
 'Banjo Brewing',
 'Bank Brewing Co',
 'Bankhead Brewing Company',
 "Banning's Inc",
 'Bar 3 BBQ and Brewing',
 'Bar D Brew House',
 'Bar Hygge / Brewery Techne',
 'Baranof Island Brewing Co',
 'Barbarian Brewing',
 "Bard's Brewery, LLC",
 'Bardo Brewpub',
 'Bardwell Winery and Brewery',
 'Bare Arms Brewing',
 'Bare Bones Brewery',
 'Bare Hands Brewery',
 'BareWolf Brewing',
 'Barebottle Brewing Company',
 'Barhop Brewing',
 'Bark Brewing Company',
 'BarkEater Craft Brewery',
 'Barking Armadillo Brewing',
 'Barking Duck Brewing Company',
 'Barley Boys Brewery',
 'Barley Brothers Brewery',
 'Barley Browns Brewpub',
 'Barley Creek Brewing Co',
 'Barley Forge Brewing',
 'Barley Head Brewery',
 'Barley Island Brewing Co',
 'Barley Johns Brewpub',
 'Barley Naked Brewing Company',
 'Barley Sprouts Brewery',
 'Barley and Hops Grill',
 "Barley's Brewing Company (Ale House No. 1)",
 "Barley's Casino and Brewing Co",
 'Barn Brewers',
 'Barn Brewery',
 'Barn Town Brewing Co.',
 'Barnaby Brewing Company',
 'Barnett and Son Brewing Company',
 'Barnhouse Brewery',
 'Barnshed Brewing',
 'Barnstable Brewing',
 'Barnstar Brewing',
 'Barrage Brewing Co',
 'Barrel Assembly',
 'Barrel Brothers Brewing Company',
 'Barrel Brothers Brewing Company, LLC',
 'Barrel Culture Brewing And Blending',
 'Barrel Dog Brewing',
 'Barrel Harbor Brewing Co.',
 'Barrel Head Brewhouse',
 'Barrel House Z',
 'Barrel Mountain Brewing',
 'Barrel Oak Farm Taphouse',
 'Barrel Theory Beer Company',
 'Barrel and Beam',
 'Barrel of Monks Brewing',
 'Barreled Souls Brewing Company LLC',
 'Barrelhouse Brewing',
 'Barrels & Bottles Brewery',
 'Barrie Brewing Company',
 'Barrier Brewing Co',
 'Barrington Brewery and Restaurant',
 'Barrio Brewing Co',
 'Barrow Brewing Company',
 'Barsideous Brewing Co',
 'Bartlett Hall',
 'Bascule Brewery And Public House',
 'Base Camp Brewing Co',
 'Basement Brewers of Texas',
 'Basic City Beer Co.',
 'Basket Case Brewing Co',
 'Bastard Brothers Brewing Company',
 'Bastet Brewing',
 'Bastion Brewing Company',
 'Bastone Brewery',
 'Batch Brewing Co',
 'Batch Craft Beer and Kolaches',
 'Bath Brewing Company',
 'Bathtub Row Brewing Co-op',
 'Battered Boar Brewing Co',
 'Battery Steele Brewing',
 'Battle Born® Beer',
 'Battle Hill Brewing Company',
 'Battle Road Brewing Co',
 'BattleHops Brewing',
 'Battlefield Brew Works',
 'Battlefield Brewing Co',
 'Battlemage Brewing Co',
 'Bauhaus Brew Labs',
 'Bavarian Bierhaus',
 'Bavarian Brothers Brewing',
 'Baxter Brewing Co, LLC',
 'Bay Brewing Company',
 'Bay Bridge Brewing Co',
 'Bay City Brewing Co.',
 'Bay State Beer Company',
 'Bayern Brewing Inc.',
 'Bayonet Cider Company',
 'Bayou Teche Brewing',
 'Beach Chalet Brewing Co',
 'Beach Grease Beer Company',
 'Beach Haus Brewery',
 'BeachFly Brewing Company',
 'Beachside Brew Pub',
 'Beachwood Blendery',
 'Beachwood Brewing',
 'Beale St Brewing Co',
 "Beale's",
 'Bear Bones Beer',
 'Bear Chase Brewing Company',
 'Bear Creek Brews',
 'Bear Island Brewing Company',
 'Bear Republic Brewing Co',
 'Bear Republic Brewing Co - Production facility',
 'Bear Republic Brewing Co Pub & Restaurant - Lakeside',
 'Bear Roots Brewing Company',
 'Bear and Bramble Brewing Company',
 'BearWaters Brewing Co',
 'Beara Brewing Co.',
 'Bearded Fox Brewing Co.',
 'Bearded Iris Brewing',
 'Bearded Owl Brewing',
 'Beards Brewery',
 'Beards Brewery Production Facility',
 'Beardslee Public House',
 'Bearpaw River Brewing Co',
 'Beaver Beer Co',
 "Beaver Brewery At Mo's Place",
 'Beaver Brewing Co',
 'Beaver Creek Brewery',
 'Beaver Island Brewing Company',
 'Beaver Street Brewery',
 'Beaverhead Brewing Co',
 'Beavers Bend Brewery',
 'Bedlam Brewing LLC',
 'Beech Mountain Brewing Company',
 'Beehive Basin Brewery',
 'Beer Army',
 'Beer By Design Brewery',
 'Beer Church Brewing Company',
 'Beer Hound Brewery',
 'Beer Is Good Brewing Company',
 'Beer Lab HI',
 'Beer Naked Brewery',
 'Beer Religion',
 'Beer Research Institute, The',
 'Beer Tree Brew Co',
 'Beer Valley Brewing Co',
 'Beerded Brothers Brewing',
 'Beerfoot Beach Bar',
 'Begyle Brewing',
 'Belching Beaver Brewery',
 'Belching Beaver Brewery Tavern & Grill',
 'Belching Beaver Brewery Vista',
 'Belding Brewing Company',
 'Belfast Bay Brewing Co',
 'Belford Brewing Company',
 'Belgian Mare Brewery',
 "Bell's Brewery, Inc",
 "Bell's Eccentric Cafe",
 'Bella Brewing',
 'Bella Casa Di Vino',
 'Belle Isle Restaurant and Brewing Co',
 'Bellefonte Brewing Co',
 'Bellevue Brewing Co',
 'Bellport Brewing Company',
 'Belltown Brewery',
 'Bellwether Brewing Co',
 'Belly Love Brewing Company',
 'Belly Up Beer Company',
 'Belmont Brewing Co',
 'Below the Radar Brewing Co',
 'Beltway Brewing Company',
 'Bemidji Brewing Company',
 "Ben's Tune-Up / Ben's Beer",
 'Benchmark Brewing Co',
 'Benchtop Brewing Company',
 'Bend Brewing Co',
 'Bend Brewing High Desert',
 'Benford Brewing Co.',
 'Bengal Brewing, LLC.',
 'Benjamin Beer Co',
 "Bennidito's Brewpub",
 'Benny Brewing Co.',
 'Benoit-Casper Brewing',
 'Benson Brewery',
 'Bent Bine Brew Co. LLC',
 'Bent Brewstillery',
 'Bent Hill Brewery',
 'Bent Kettle Brewing Company',
 'Bent Paddle Brewing Co',
 'Bent River Brewing Co',
 'Bent River Brewing Co Production & Tasting Room',
 'Bent Run Brewing Co',
 'Bent Shovel Brewing',
 'Bent Water Brewing Company',
 'Bentonville Brewing Co',
 "Berchman's Brewing Company",
 'Berkeley Springs Brewing Co',
 'Berkley Beer Company',
 'Berkshire Brewing Co Inc',
 'Berlin Brewing Company',
 'Berthoud Brewing Company',
 'Bertrams Salmon Valley Brewery',
 'Berts Brewing Co.',
 'Berwick Brewing Co',
 'Bevel Craft Brewing',
 'Beyond The Mountain Brewing Company',
 'Bhramari Brewhouse',
 'Bias Brewing',
 'Bier Brewery and Taproom',
 'Bier Distillery',
 'Bier One Brewing',
 'BierWerks',
 'Biercamp',
 'Biere De Mac Brew Works',
 'Bierly Brewing @ Eats and Treats',
 'Bierstadt Lagerhaus',
 'Big Alice Brewing Company',
 'Big Ash Brewing Company',
 'Big Axe Brewing Company',
 'Big Barn Brewing Co / Bodacious Berries Fruits and Brews',
 'Big Beach Brewing Company',
 'Big Bear Brewing Co',
 'Big Bear Lake Brewing Co',
 'Big Beaver Brewing Co',
 'Big Bend Brewing Co',
 'Big Block Brewing',
 'Big Blue Brewing',
 'Big Boiler Brewing',
 'Big Boss Brewing Co',
 'Big Bottom Brewery',
 'Big Cat Brewing Company',
 'Big Choice Brewing Co',
 'Big Data Brewing Company LLC',
 'Big Ditch Brewing Company',
 "Big Dog's Brewing Co",
 'Big Elm Brewing',
 'Big Frog Brewing Company',
 'Big Grove Brewery',
 'Big Grove Brewery & Tap Room',
 'Big Hart Brewing Company',
 'Big Head Brewing Co.',
 'Big Horse Brewpub',
 'Big Inlet Brewing',
 'Big Island Brewhaus',
 'Big Lake Brewing',
 'Big Lake Brewing Production Facility',
 'Big Leaf Brewing',
 'Big Lick Brewing Company',
 'Big Lost Meadery and Brewery',
 'Big Lug Canteen',
 'Big Muddy Brewing Co',
 'Big Rack Brew Haus',
 'Big Rip Brewing Company',
 'Big River Brewery, LLC',
 'Big River Grille & Brewing Works - Chattanooga',
 'Big River Grille & Brewing Works - Disney',
 'Big Rock Chop House & Brewery',
 'Big Sexy Brewing Company',
 'Big Sioux Brewing Company',
 'Big Sky Brewing Co',
 'Big Slide Brewery',
 'Big Storm Brewing',
 'Big Storm Brewing Co.',
 'Big Stump Brewing Company',
 'Big Texan Brewery',
 'Big Thompson Brewery',
 'Big Thorn Farm and Brewery',
 'Big Timber Brewing',
 'Big Time Brewery',
 'Big Top Brewing Company',
 'Big Truck Brewery',
 'Big Tupper Brewing',
 'Big Ugly Brewing Co',
 'Big Water Brewery',
 'Big Wood Brewery',
 'Big Woods Brewery',
 "Big's BBQ Brewpub",
 'Bigelow Brewing Company',
 'Bike Dog Brewing Co',
 'Bike Rack Brewing Co',
 'Bike Rack Brewing Co - 8th Street Market',
 'Bike TrAle Brewing',
 'Biker Brew House',
 "Bilbo's Pizza and Brewery",
 "Bill's Brewing Co.",
 "Bill's Front Porch Pub and Brewery",
 "Bill's Tavern and Brewhouse",
 'Billsburg Brewery',
 'Biloba Brewing',
 'Biloxi Brewing Co',
 'Biltmore Brewing Company',
 'Bine Valley Brewing',
 'Binghamton Brewing Co',
 'Bingo Beer Company',
 "Birch's On the Lake",
 'Bircus Brewing',
 'Bird Brain Brewing Company',
 'Bird Nickel Brewing Company',
 'Bird Street Brewing',
 'BirdFish Brewing Co',
 'Birdboy Brewing Co',
 'Birds Fly South Ale Project',
 'BirdsView Brewing Co',
 'Birdsong Brewing Co.',
 'Birmingham District Brewing',
 'Birravino',
 'Birreria @ Eataly',
 'Birthright Brewing Co.',
 'Biscayne Bay Brewing Co',
 'Bismarck Brewing',
 'Bison Brewing Co',
 'Bissell Brothers Brewing',
 'Bistronomy B2B Craft Brewery',
 'Bitter Brothers Brewing Co.',
 'Bitter Creek Brewing Co',
 'Bitter Old Fecker Rustic Ales',
 'Bitter Root Brewing Co',
 'Bitter Sisters Brewing Company',
 'Bittersweet Brewing Company, LLC',
 'Black Acre Brewing Co',
 'Black Beak Brewing',
 'Black Bottle Brewery',
 'Black Bridge Brewery',
 'Black Cat Brewing',
 'Black Circle Brewing Co',
 'Black Cloister Brewing Co',
 'Black Cock Brewing Company',
 'Black Creek Brewery',
 'Black Doubt Brewing Company',
 'Black Eagle Brewery',
 'Black Fire Winery',
 'Black Flag Brewing Company',
 'Black Fleet Brewing',
 'Black Forest Brewery',
 'Black Forest Brewing Company',
 'Black Frog Brewing Co',
 'Black Gold Brewing Co',
 'Black Hammer Brewing',
 'Black Hat Brew Works',
 'Black Hog Brewing Co',
 'Black Hoof Brewing',
 'Black Horizon Brewing Company',
 'Black Horse Brewery',
 'Black Husky Brewing LLC',
 'Black Label Brewing Company',
 'Black Leg Brewery',
 'Black Lotus Brewing Co',
 'Black Magic Brewing',
 'Black Market Brewing Co',
 'Black Mesa Brewing Company',
 'Black Narrows Brewing Company',
 'Black Plague Brewing',
 'Black Pond Brews',
 'Black Project Spontaneous & Wild Ales',
 'Black Raven Brewing Co',
 'Black Sands Brewery',
 'Black Shirt Brewing Co',
 'Black Sky Brewery',
 'Black Star Co-op',
 'Black Tooth Brewing Co',
 'Black Walnut Brewery',
 'Black Warrior Brewing Co.',
 'Black Wolf Brewing LLC',
 'BlackRock Brewers',
 'BlackStack Brewing',
 'Blackadder Brewing Company',
 "Blackbeard's Brewing Company",
 'Blackberry Farm Brewery',
 'Blackfoot River Brewing Co',
 'Blackhorse Brewery',
 'Blackhorse Pub and Brewery',
 'Blackhorse Pub and Brewery -  Knoxville',
 'Blacklist Artisan Ales',
 'Blackrock Beer Co',
 'Blackrocks Brewery',
 'Blackrocks Brewery - Production Facility',
 'Blacksmith Brewing Co',
 'Blackstone Brewing Co',
 'Blackwater Brewing Co',
 'Blackwater Draw Brewing Company',
 'Blaker Brewing',
 'Blank Canvas Brewery',
 'Blasted Barley Beer Company',
 'Blasty Bough Brewing Company',
 'Blazing Tree Brewery',
 'Bleeding Heart Brewery',
 'Blewett Brewing Company',
 'Blind Bat Brewery LLC, The',
 'Blind Owl Brewery',
 'Blind Pig Brewery',
 'Blind Squirrel Brewery',
 'Block 15',
 'Block 15 Brewery & Tap Room',
 'Block Brewing Company',
 'Bloom Brew',
 'Bloomer Brewing Co',
 'Bloomington Brewing Co',
 'Bloomington Brewing Co - Production Facility',
 'Blowing Rock Brewing Company',
 'Blue Blaze Brewing',
 'Blue Blood Brewing Company',
 'Blue Canoe Brewing Co',
 'Blue Collar Brewery, Inc.',
 'Blue Corn Cafe',
 'Blue Cow Cafe/Big Rapids Brewing Co',
 'Blue Earl Brewing Company',
 'Blue Frog Brewing Company',
 'Blue Ghost Brewing Company',
 'Blue Heron Brew Pub',
 'Blue Heron Brewing',
 'Blue Island Beer Co',
 'Blue Line Brewery',
 'Blue Moon Brewery',
 'Blue Mountain Barrel House and Organic Brewery',
 'Blue Mountain Pizza and Brew Pub',
 'Blue Nose Brewery',
 'Blue Note Brewing Company',
 'Blue Oak Brewing Co',
 'Blue Owl Brewing',
 'Blue Pants Brewery',
 'Blue Pike Cantina',
 'Blue Point Brewing',
 'Blue Point Brewing Co',
 'Blue Ridge Brewing Co/Foothills Brewing',
 'Blue Ridge Community College',
 'Blue Skye Brewery',
 'Blue Spruce Brewing',
 'Blue Spruce Brewing Littleton',
 'Blue Stallion Brewing Company',
 'Blue Star Brewing Co',
 'Blue Tractor Brewing Co',
 'Blue Wolf Brewing Company LLC',
 'Blue Zone Brewing Co.',
 'BlueTarp Brewing Co',
 'Bluebird Brasserie',
 'Bluebird Microcreamery and Brewery',
 'Bluebonnet Beer Co',
 'Bluegrass Brewing Co - Brewpub',
 'Bluejacket',
 'Blueprint Brewing Co.',
 'Blues City Brewing Co',
 'Bluestone Brewing Company',
 'Bluewater Brewing Co',
 'Blü Dragonfly Brewing',
 'Bnaf, LLC',
 'Boak Brewing Co',
 ...]
In [21]:
df_names2 = pd.DataFrame(sorted(d), columns=['names_2'])
df_names2.to_csv('names2.csv', index=False)

Ok, so now I have two csv files with unique brewery names that I can compare in Notepad++.

Once I have done this, I can read in new csv files that list the matches I found.

In [22]:
df_names1_analyzed = pd.read_csv('names1_analyzed.csv')
df_names1_analyzed
Out[22]:
names_1
0 (512) Brewing Company
1 10 Barrel Brewing Co.
2 16 Mile Brewing Company
3 192 Brewing Company
4 2nd Shift Brewery
... ...
280 Lucky Bucket Brewing Co.
281 Maui Brewing Co. (Brewpub)
282 Nebraska Brewing Company
283 Thunderhead Brewing Company
284 Upstream Brewing Company - Old Market

285 rows × 1 columns

In [23]:
df_names2_analyzed = pd.read_csv('names2_analyzed.csv')
df_names2_analyzed
Out[23]:
names_2
0 (512) Brewing Co
1 10 Barrel Brewing Co
2 16 Mile Brewing Co
3 192 Brewing
4 2nd Shift Brewing Co
... ...
280 Lucky Bucket Brewing Co
281 Maui Brewing Co, Brewpub
282 Nebraska Brewing Co - La Vista
283 Thunderhead Brewing Co
284 Upstream Brewing Co (#1)

285 rows × 1 columns

Sweet, so I was able to find 285 additional breweries. This process was a lot less painful than I imagined, but I am sure that there is a better way to go about doing this.

If my reviewer has any suggestions about an alternative approach I could of taken that would be awesome!!

In [24]:
# create dictionary from these brewery names. 
names_dict = dict(zip(df_names2_analyzed['names_2'], df_names1_analyzed['names_1']))
In [25]:
names_dict
Out[25]:
{'(512) Brewing Co': '(512) Brewing Company',
 '10 Barrel Brewing Co': '10 Barrel Brewing Co.',
 '16 Mile Brewing Co': '16 Mile Brewing Company',
 '192 Brewing': '192 Brewing Company',
 '2nd Shift Brewing Co': '2nd Shift Brewery',
 '3 Stars Brewing Co': '3 Stars Brewing Company',
 '4 Hands Brewing Co': '4 Hands Brewing Co.',
 '5 Rabbit Cerveceria Inc': '5 Rabbit Cerveceria',
 '7 Seas Brewing Co': '7 Seas Brewery and Taproom',
 '7venth Sun Brewery': '7venth (Seventh) Sun Brewery',
 'Abbey Brewing Co': 'Abbey Brewing Co.',
 'Abita Brewing Co': 'Abita Brewing Co.',
 "Adelbert's Brewery LLC": "Adelbert's Brewery",
 'Adirondack Pub and Brewery': 'Adirondack Pub & Brewery',
 'Against the Grain Brewery': 'Against The Grain Brewery & Smokehouse',
 'Aiken Brewing Co': 'Aiken Brewing Company',
 'Airways Brewing Co': 'Airways Brewing Company',
 'Alameda Brewing Co': 'Alameda Brewhouse',
 'Alamo Beer Co': 'Alamo Beer Co.',
 'Alcatraz Brewing': 'Alcatraz Brewing Co.',
 'Alcazar Brewery, LLC.': 'Alcazar Brewing Co.',
 'AleSmith Brewing Co': 'AleSmith Brewing Company',
 'Allagash Brewing Co': 'Allagash Brewing Company',
 "Alltech's Lexington Brewing": 'Alltech’s Lexington Brewing and Distilling Co.',
 'Almanac Beer Company': 'Almanac Beer Co.',
 'Altitude Chophouse and Brewery': 'Altitude Chophouse & Brewery',
 'American Honor Beer Co': 'American Honor Beer Company',
 'Amherst Brewing Co / Hangar Pub and Grill': 'Amherst Brewing Co.',
 'Anacapa Brewing Co': 'Anacapa Brewing Company',
 'Anacortes Brewery/Rockfish Grill': 'Anacortes Brewery & Rockfish Grill',
 'Anchor Brewing Co': 'Anchor Brewing Company',
 'Anchorage Brewing Co': 'Anchorage Brewing Company',
 'Anderson Valley Brewing Co': 'Anderson Valley Brewing Company',
 'Angel City Brewery': 'Angel City Brewing',
 'Angry Minnow, The': 'Angry Minnow Brewpub',
 'Anheuser-Busch InBev': 'Anheuser-Busch',
 'Appalachian Brewing Co - Harrisburg': 'Appalachian Brewing Company',
 'Arbor Brewing Co': 'Arbor Brewing Company',
 'Arcadia Brewing Co': 'Arcadia Brewing Company',
 'Asher Brewing Co': 'Asher Brewing Company',
 'Asheville Brewing Co': 'Asheville Brewing Company',
 'Aspen Brewing Company': 'Aspen Brewing Co.',
 'Ass Clown Brewing Co': 'Ass Clown Brewing',
 'Atlantic Brewing Co': 'Atlantic Brewing Company',
 'Atlas Brew Works': 'Atlas Brewery Ltd',
 'Atwater Brewing Co': 'Atwater Block Brewery',
 'August Schell Brewing Co': 'August Schell Brewing Company',
 'Augusta Brewing Co': 'Augusta Brewing Company',
 'Avery Brewing Co': 'Avery Brewing Company',
 'Avondale Brewing Co': 'Avondale Brewing Company',
 "B.O.B's Brewery": "B.O.B.'s House Of Brews",
 "BJ's Brewhouse": 'BJs Brewhouse',
 "Babe's Bar-B-Que and Brewhouse": "Babe's Bar-B-Que And Brewery",
 'Back Forty Beer Co': 'Back Forty Beer Co.',
 'Back Street Brewery & Tasting Room': 'Back Street Brewery',
 'Backlash Beer Co': 'Backlash Beer Company',
 'Backwoods Brewing Company': 'Backwoods Brewing',
 'Bandwagon Brewery': 'Bandwagon Brew Pub',
 'Bare Bones Brewery': 'Bare Bones Grill & Brewery',
 'Barley Brothers Brewery': 'Barley Brothers Brewery & Grill',
 'Barley Browns Brewpub': "Barley Brown's Brewpub",
 'Barley Creek Brewing Co': 'Barley Creek Brewing Company',
 'Barley Island Brewing Co': 'Barley Island Brewing Company',
 'Barley Johns Brewpub': "Barley John's Brew Pub",
 'Barley and Hops Grill': 'Barley and Hops Grill & Microbrewery',
 "Barley's Brewing Company (Ale House No. 1)": "Barley's Brewing Company",
 "Barley's Casino and Brewing Co": "Barley's Casino & Brewing Co.",
 'Barrelhouse Brewing': 'Barrel House Brewing Company',
 'Barrier Brewing Co': 'Barrier Brewing Company',
 'Barrington Brewery and Restaurant': 'Barrington Brewery & Restaurant',
 'Barrio Brewing Co': 'Barrio Brewing Company',
 'Battered Boar Brewing Co': 'Battered Boar Brewing Company',
 'Battlefield Brewing Co': 'Battlefield Brewing Co At The Pub',
 'Baxter Brewing Co, LLC': 'Baxter Brewing Co.',
 'Bayern Brewing Inc.': 'Bayern Brewing, Inc.',
 'Bayou Teche Brewing': 'Bayou Teche Brewery',
 'Beach Chalet Brewing Co': 'Beach Chalet Brewery and Restaurant',
 'Bear Creek Brews': 'Bear Creek Brewing Company',
 'Bear Republic Brewing Co': 'Bear Republic Brewing Co.',
 'Beaver Beer Co': 'Beaver Beer Company',
 'Beaver Brewing Co': 'Beaver Brewing Company',
 'Beaver Street Brewery': 'Beaver Street Brewery & Whistle Stop Cafe',
 'Belfast Bay Brewing Co': 'Belfast Bay Brewing Co.',
 "Bell's Brewery, Inc": "Bell's Brewery, Inc.",
 'Belle Isle Restaurant and Brewing Co': 'Belle Isle Restaurant & Brewery',
 'Belmont Brewing Co': 'Belmont Brewing Company',
 'Bend Brewing Co': 'Bend Brewing Company',
 'Benny Brewing Co.': 'Benny Brewing Company',
 'Bent River Brewing Co': 'Bent River Brewing Company',
 'Berkshire Brewing Co Inc': 'Berkshire Brewing Company Inc.',
 'Berwick Brewing Co': 'Berwick Brewing Company',
 'Big Bear Brewing Co': 'Big Bear Brewing Company',
 'Big Beaver Brewing Co': 'Big Beaver Brewing Co.',
 'Big Boss Brewing Co': 'Big Boss Brewing',
 "Big Dog's Brewing Co": "Big Dog's Draft House",
 'Big Horse Brewpub': 'Big Horse Brew Pub / Horse Feathers & Co.',
 'Big Muddy Brewing Co': 'Big Muddy Brewing',
 'Big River Brewery, LLC': 'Big River Brewpub',
 'Big River Grille & Brewing Works - Chattanooga': 'Big River Grille & Brewing Works',
 'Big Rock Chop House & Brewery': 'Big Rock Chop & Brewhouse',
 'Big Sky Brewing Co': 'Big Sky Brewing Company',
 'Big Time Brewery': 'Big Time Brewery & Alehouse',
 'Big Woods Brewery': 'Big Woods Brewing Co.',
 "Bill's Tavern and Brewhouse": "Bill's Tavern & Brewhouse",
 'Birdsong Brewing Co.': 'Birdsong Brewing',
 'Bison Brewing Co': 'Bison Brewing',
 'Bitter Root Brewing Co': 'Bitter Root Brewing',
 'Black Creek Brewery': 'Black Creek Historic Brewery',
 'Black Lotus Brewing Co': 'Black Lotus Brewery',
 'Black Market Brewing Co': 'Black Market Brewing Co.',
 'Black Raven Brewing Co': 'Black Raven Brewing Co.',
 'Black Star Co-op': 'Black Star Co-op Pub & Brewery',
 'Black Tooth Brewing Co': 'Black Tooth Brewing Company',
 'Blackfoot River Brewing Co': 'Blackfoot River Brewing Company',
 'Block 15 Brewery & Tap Room': 'Block 15 Brewery & Restaurant',
 'Bloomington Brewing Co': "Bloomington Brewing Company / Lennie's",
 'Blue Corn Cafe': 'Blue Corn Cafe and Brewery',
 'Blue Heron Brew Pub': 'Blue Heron BrewPub',
 'Blue Heron Brewing': 'Blue Heron Brewing Company',
 'Blue Pants Brewery': 'Blue Pants Brewery LLC',
 'Blue Point Brewing Co': 'Blue Point Brewing Company',
 'Blue Tractor Brewing Co': 'Blue Tractor BBQ & Brewery',
 'Bluegrass Brewing Co - Brewpub': 'Bluegrass Brewing Co.',
 'Boak Brewing Co': 'Boak Beverage',
 'Bohemian Brewery and Grill': 'Bohemian Brewery',
 'Bonaventure Brewing Co': 'Bonaventure Brewing',
 'Boneyard Beer Co': 'Boneyard Beer Company',
 'Bonfire Brewing': 'Bonfire Brewing Co.',
 'Bootleggers Brewery': "Bootlegger's Brewery",
 'Boston Beer Co - DBA Samuel Adams Brewing Co': 'Boston Beer Company (Samuel Adams)',
 'Bottle Tree Beer Co': 'BottleTree Beer Co.',
 'Boulevard Brewing Co': 'Boulevard Brewing Co.',
 'Bozeman Brewing Co': 'Bozeman Brewing Company',
 'Brau Brothers Brewing Co': 'Brau Brothers Brewing Co. LLC',
 'Breakwater Brewing Co': 'Breakwater Brewing',
 'BrewDog Brewing Company, LLC': 'BrewDog',
 'Brewbakers Brewing Co': 'Brewbakers Brewing',
 "Brewer's Alley Restaurant and Brewery": "Brewer's Alley Restaurant & Brewery",
 'Brick House Brewery and Restaurant': 'Brick House Brewery',
 'Bridge Brew Works': 'Bridge Brew Works LLC',
 'Bristol Brewing Co': 'Bristol Brewing Company',
 'Broad Ripple Brewing Co': 'Broad Ripple Brew Pub',
 'Broadway Brewery': 'Broadway Brewery & Restaurant',
 'Broken Tooth Brewing Co': 'Broken Tooth Brewing',
 "Brown's Brewing Co": "Brown's Brewing Company",
 'Brutopia Brewery and Kitchen': 'Brutopia',
 'Bubes Brewery': "Bube's Brewery",
 'Buffalo Bills Brewery': "Buffalo Bill's Brewery",
 'Buffalo Water Beer Co': 'Buffalo Water Beer Company',
 'Bull & Bush Brewery': 'Bull & Bush Pub & Brewery',
 'Burnside Brewing Co': 'Burnside Brewing Co.',
 'Buzzards Bay Brewing Co': 'Buzzards Bay Brewing',
 'COAST Brewing Co': 'COAST Brewing Company',
 'Caldera Brewing Co': 'Caldera Brewing Company',
 'Calfkiller Brewing Co': 'Calfkiller Brewing Company',
 'Cambridge Brewing Co': 'Cambridge Brewing Company',
 'Cannon Brewpub': 'Cannon Brew Pub',
 'Cape Ann Brewing Co': 'Cape Ann Brewing Company',
 'Capital Brewery Co Inc': 'Capital Brewery',
 'Capitol City Brewing Co': 'Capitol City Brewing Company',
 'Captain Lawrence Brewing Co': 'Captain Lawrence Brewing Co.',
 'Carolina Brewing Co': 'Carolina Brewing Company',
 'Carters Brewing': "Carter's Brewing",
 'Carton Brewing Co': 'Carton Brewing Company',
 'Carver Brewing Co': 'Carver Brewing Co.',
 'Cascade Lakes Brewing Co': 'Cascade Lakes Brewing Co.',
 'Castle Rock Beer Company': 'Castle Rock Brewery',
 'Central Waters Brewing Co': 'Central Waters Brewing Company',
 'Charleville Vineyard & Microbrewery': 'Charleville Vineyard Microbrewery',
 'Charlie and Jakes Brewery and Grille': "Charlie & Jake's Brewery & Grille",
 'Chick Brewing Company': 'Chick Brewing Co.',
 'Cigar City Brewing Co': 'Cigar City Brewing',
 'Cisco Brewers': 'Cisco Brewers Inc.',
 'Cismontane Brewing Co': 'Cismontane Brewing Company',
 'City Brewing Co': 'City Brewing Company, LLC',
 'City Steam Brewery': 'City Steam Brewery Café',
 'Climax Brewing Co': 'Climax Brewing Company',
 'Coastal Empire Beer Co': 'Coastal Empire Beer Company',
 'Cocoa Beach Brewing Co': 'Cocoa Beach Brewing Company',
 'Denver Beer Co': 'Denver Beer Co.',
 'Detroit Beer Co': 'Detroit Beer Co.',
 'Devils Backbone Brewing Co - Basecamp': 'Devils Backbone Brewing Company',
 'Diamond Bear Brewing Co': 'Diamond Bear Brewing Company',
 'Dicks Brewing Co': "Dick's Brewing Company",
 'Dogfish Head Craft Brewery': 'Dogfish Head Brewery',
 'East End Brewing Co': 'East End Brewing Company',
 'Element Brewing Co': 'Element Brewing Company',
 'Elevation 66 Brewing Co': 'Elevation 66 Brewing',
 'Elevator Brewery and Draught Haus': 'Elevator Brewery & Draught Haus',
 'Elk Head Brewing Co': 'Elk Head Brewing Company',
 'Ellicott Mills Brewing Co': 'Ellicott Mills Brewing Company',
 'Ellicottville Brewing Co': 'Ellicottville Brewing Co.',
 'Elm City Brewing Company': 'Elm City Brewing Co.',
 'Elysian Brewing Co': 'Elysian Brewing Company',
 'Empire Brewing Co': 'Empire Brewing Company',
 'Feral Brewing Company': 'Feral Brewing Co.',
 'Ferguson Brewing Co': 'Ferguson Brewing Company',
 'Fiddlehead Brewing': 'Fiddlehead Brewing Company',
 'FiftyFifty Brewing Co': 'FiftyFifty Brewing Co.',
 'Figueroa Mountain Brewing': 'Figueroa Mountain Brewing Co.',
 'Figure Eight Brewing Co': 'Figure 8 Brewing',
 'Finger Lakes Beer Co': 'Finger Lakes Beer Company',
 'Firestone Walker Brewing Co': 'Firestone Walker Brewing Co.',
 'Flagstaff Brewing Co': 'Flagstaff Brewing Company',
 'Flat Branch Pub and Brewing': 'Flat Branch Pub & Brewery',
 'Flat Earth Brewing Co': 'Flat Earth Brewing Company',
 'Flat Tail Brewing Co': 'Flat Tail Brewing',
 'Flathead Lake Brewing Co': 'Flathead Lake Brewing Company',
 'Florida Beer Co': 'Florida Beer Company',
 'Flossmoor Station Brewing Co': 'Flossmoor Station Restaurant & Brewery',
 'Flyers Restaurant and Brewery': 'Flyers Restaurant And Brewery',
 'Flying Bison Brewing Co': 'Flying Bison Brewing Company',
 'Flying Fish Brewing Co': 'Flying Fish Brewing Company',
 'Flying Goose Brewpub': 'Flying Goose Brew Pub & Grille',
 'Foothills Brewing and Beverage Co': 'Foothills Brewing & Beverage Co.',
 'Fossil Fuels Brewing Co': 'Fossil Fuels Brewing Company',
 'Founders Brewing Co': 'Founders Brewing Company',
 'Fountain Square Brewing Co': 'Fountain Square Brewing Co.',
 'Four Horsemen Brewery': 'Four Horsemen Brewing Company',
 'Four Peaks Brewing Co': 'Four Peaks Brewing Company',
 'Four Sons Brewing': 'Four Sons Brewery Restaurant',
 'Free State Brewing Co': 'Free State Brewing Co.',
 'Freetail Brewing Co': 'Freetail Brewing Company',
 'Fremont Brewing Co': 'Fremont Brewing Company',
 'Full Circle Brewing Co': 'Full Circle Brewing Co.',
 'Galena Brewing Co': 'Galena Brewing Company',
 'Gilded Otter Brewing Co': 'Gilded Otter Brewing Company',
 'Gilgamesh Brewing Co': 'Gilgamesh Brewing',
 'Glacier Brewing Co': 'Glacier Brewing Company',
 'Good People Brewing Co': 'Good People Brewing Company',
 'Goose Island Beer Co / Fulton St': 'Goose Island Beer Co.',
 'Great Basin Brewing Co': 'Great Basin Brewing Co.',
 'Great Lakes Brewing Co': 'Great Lakes Brewing Company',
 'Harvest Moon Brewing': 'Harvest Moon Brewery',
 'Highland Brewing Co': 'Highland Brewing Company Ltd.',
 "Hoppin' Frog Brewing Co": "Hoppin' Frog Brewery",
 'Hoppy Brewing Co': 'Hoppy Brewing Co.',
 'Indian Wells Brewing Co': 'Indian Wells Brewing Company',
 'Island Brewing Co': 'Island Brewing Company',
 'Lake Placid Pub and Brewery': 'Lake Placid Craft Brewing Company',
 'Lake Superior Brewing Co': 'Lake Superior Brewing Company',
 'Lakefront Brewery Inc': 'Lakefront Brewery, Inc.',
 'Legacy Brewing Company': 'Legacy Brewing Co.',
 'Legend Brewing Co': 'Legend Brewing Co.',
 'Mad Fox Brewing Co': 'Mad Fox Brewing Company',
 'Mad River Brewing Co': 'Mad River Brewing Company',
 'Madcap Brew Co.': 'Madcap Brewery',
 'Madhouse Brewing Co': 'Madhouse Brewing Company',
 'Madison River Brewing Co': 'Madison River Brewing Company',
 'Magic Hat Brewing Co/ North American Breweries': 'Magic Hat Brewing Company',
 'New Holland Brewing Co': 'New Holland Brewing Company',
 'New Jersey Beer Co': 'New Jersey Beer Company',
 'New Planet Beer Co': 'New Planet Beer Company',
 'New South Brewing Co': 'New South Brewing Co.',
 'Red Eye Brewing Co': 'Red Eye Brewing Company',
 'Red Hill Brewing Co': 'Red Hill Brewery',
 'Red Lodge Ales Brewing Co': 'Red Lodge Ales',
 'Red Oak Brewing Co': 'Red Oak Brewery',
 'Revival Brewing': 'Revival Brewing Co.',
 'Rockmill Brewing Co': 'Rockmill Brewery',
 'Rockslide Brewing Co': 'Rockslide Brewery & Pub',
 'Ruby Mountain Brewing Co': 'Ruby Mountain Brewing',
 'Ruckus Brewing Co': 'Ruckus Brewing',
 'Sackets Harbor Brewing Co': 'Sackets Harbor Brewing Co.',
 'Saint Arnold Brewing Co': 'Saint Arnold Brewing Company',
 'Sherwood Brewing Co': 'Sherwood Brewing Company',
 'Sleepy Dog Brewing Co': 'Sleepy Dog Brewery',
 'Slippery Pig Brewery': 'Slippery Pig Brewing',
 'Telegraph Brewing Co': 'Telegraph Brewing Company',
 'Telluride Brewing Co': 'Telluride Brewing Co.',
 'Westbrook Brewing Co': 'Westbrook Brewing Co.',
 'Crow Peak Brewing Co': 'Crow Peak Brewing',
 'Dempseys Brewery, Pub': "Dempsey's Brewery, Restaurant & Pub",
 'Empyrean Brewing Co': 'Empyrean Brewing Company',
 'Fargo Brewing Co': 'Fargo Brewing Company',
 'Gordon Biersch Brewery Restaurant - Honolulu': 'Gordon Biersch Brewery Restaurant',
 'Green Flash Brewing Co': 'Green Flash Brewing Co.',
 'Hawaii Nui Brewing': "Hawai'i Nui Brewing / Mehana Brewing Company",
 'Kona Brewing Co': 'Kona Brewing Co.',
 'Lazy Magnolia Brewing Co, LLC': 'Lazy Magnolia Brewing Company',
 'Lucky Bucket Brewing Co': 'Lucky Bucket Brewing Co.',
 'Maui Brewing Co, Brewpub': 'Maui Brewing Co. (Brewpub)',
 'Nebraska Brewing Co - La Vista': 'Nebraska Brewing Company',
 'Thunderhead Brewing Co': 'Thunderhead Brewing Company',
 'Upstream Brewing Co (#1)': 'Upstream Brewing Company - Old Market'}

Now we can use names_dict to rename the brewery names in brewery_df.

In [26]:
brewery_df.brewery_name.replace(names_dict, value=None, inplace=True)
In [27]:
# did this work?
brewery_df.query("brewery_name == '(512) Brewing Company'")
Out[27]:
id brewery_name brewery_type street address_2 address_3 city state county_province postal_code country longitude latitude phone website_url updated_at created_at
6220 6596 (512) Brewing Company micro 407 Radam Ln Ste F200 NaN NaN Austin Texas NaN 78745-1197 United States NaN NaN 5.129212e+09 http://www.512brewing.com 2018-08-11T21:39:44.563Z 2018-07-24T01:34:17.326Z

Looks like it worked. Now we can rerun the matches code.

In [28]:
x = beer_df_clean['brewery_name'].unique().tolist()
y = brewery_df['brewery_name'].unique().tolist()
matches = list(set(x).intersection(y))
len(matches)
Out[28]:
553

Awesome, we were able to match all 285 breweries, 268 (initial matches) + 285 (additional matches) = 553.

Now we can merge these matched breweries to beer_df_clean, so that we can have location data for each brewery in our main dataframe. This will allow us to make the geospatial viz.

In [29]:
beer_df_clean = beer_df_clean.merge(brewery_df, on='brewery_name')
beer_df_clean
Out[29]:
brewery_id brewery_name beer_beerid beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate ... state county_province postal_code country longitude latitude phone website_url updated_at created_at
0 1075 Caldera Brewing Company 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 ... Oregon NaN 97520-3709 United States -122.663374 42.183738 5.414825e+09 http://www.calderabrewing.com 2018-08-24T15:46:07.515Z 2018-07-24T01:34:05.325Z
1 1075 Caldera Brewing Company 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 3.5 3.5 3.0 ... Oregon NaN 97520-3709 United States -122.663374 42.183738 5.414825e+09 http://www.calderabrewing.com 2018-08-24T15:46:07.515Z 2018-07-24T01:34:05.325Z
2 1075 Caldera Brewing Company 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.5 3.5 3.5 4.0 ... Oregon NaN 97520-3709 United States -122.663374 42.183738 5.414825e+09 http://www.calderabrewing.com 2018-08-24T15:46:07.515Z 2018-07-24T01:34:05.325Z
3 1075 Caldera Brewing Company 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 2.5 3.5 2.0 ... Oregon NaN 97520-3709 United States -122.663374 42.183738 5.414825e+09 http://www.calderabrewing.com 2018-08-24T15:46:07.515Z 2018-07-24T01:34:05.325Z
4 1075 Caldera Brewing Company 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 4.0 3.0 3.5 3.5 ... Oregon NaN 97520-3709 United States -122.663374 42.183738 5.414825e+09 http://www.calderabrewing.com 2018-08-24T15:46:07.515Z 2018-07-24T01:34:05.325Z
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 18139 Pacific Beach Ale House 58076 Amber Wave American Amber / Red Ale 5.4 3.5 3.0 4.0 4.0 ... California NaN 92109-3905 United States -117.255265 32.794255 8.585812e+09 http://www.pbalehouse.com 2018-08-24T00:00:37.852Z 2018-07-24T01:32:59.216Z
563962 18139 Pacific Beach Ale House 58078 Shipwrecked Stout American Stout 6.5 3.0 3.0 4.0 3.5 ... California NaN 92109-3905 United States -117.255265 32.794255 8.585812e+09 http://www.pbalehouse.com 2018-08-24T00:00:37.852Z 2018-07-24T01:32:59.216Z
563963 18139 Pacific Beach Ale House 58078 Shipwrecked Stout American Stout 6.5 3.5 3.5 3.5 4.0 ... California NaN 92109-3905 United States -117.255265 32.794255 8.585812e+09 http://www.pbalehouse.com 2018-08-24T00:00:37.852Z 2018-07-24T01:32:59.216Z
563964 18139 Pacific Beach Ale House 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 4.0 3.5 4.0 3.5 ... California NaN 92109-3905 United States -117.255265 32.794255 8.585812e+09 http://www.pbalehouse.com 2018-08-24T00:00:37.852Z 2018-07-24T01:32:59.216Z
563965 18139 Pacific Beach Ale House 69416 Belgian Dubbel Dubbel 6.0 3.0 3.5 3.0 2.5 ... California NaN 92109-3905 United States -117.255265 32.794255 8.585812e+09 http://www.pbalehouse.com 2018-08-24T00:00:37.852Z 2018-07-24T01:32:59.216Z

563966 rows × 27 columns

In [30]:
# number of unique beers with ratings
len(beer_df_clean.beer_name.unique())
Out[30]:
9621

So, I've decided to merge beer_df_clean and brewery_df using an inner join. This will drop my number of rows down to 563,966 from 1,518,814.

For the purposes of this analysis, this is more than enough data. Especially since there are over 9,000 unique beers with ratings, and over 550 unique breweries with location data.

Let's make some final touches to finish cleaning beer_df_clean.

In [31]:
beer_df_clean.columns
Out[31]:
Index(['brewery_id', 'brewery_name', 'beer_beerid', 'beer_name', 'beer_style',
       'beer_abv', 'review_overall', 'review_aroma', 'review_appearance',
       'review_palate', 'review_taste', 'id', 'brewery_type', 'street',
       'address_2', 'address_3', 'city', 'state', 'county_province',
       'postal_code', 'country', 'longitude', 'latitude', 'phone',
       'website_url', 'updated_at', 'created_at'],
      dtype='object')
In [32]:
beer_df_clean.drop(columns=['id', 'street', 'postal_code', 'country', 'phone',
                            'website_url', 'updated_at'], inplace=True)
In [33]:
# rearrange column headers. 
beer_df_clean = beer_df_clean[['beer_beerid', 'beer_name', 'beer_style', 'beer_abv', 'review_overall',
                               'review_aroma', 'review_appearance', 'review_palate', 'review_taste',
                               'brewery_id', 'brewery_name', 'brewery_type', 'city', 'state',
                               'longitude', 'latitude']]
beer_df_clean
Out[33]:
beer_beerid beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste brewery_id brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 4.5 1075 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 3.5 3.5 3.0 3.5 1075 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.5 3.5 3.5 4.0 4.0 1075 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 2.5 3.5 2.0 3.5 1075 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 4.0 3.0 3.5 3.5 4.0 1075 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.5 3.0 4.0 4.0 3.5 18139 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.0 3.0 4.0 3.5 3.5 18139 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.5 3.5 3.5 4.0 4.0 18139 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 4.0 3.5 4.0 3.5 3.5 18139 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.0 3.5 3.0 2.5 3.5 18139 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563966 rows × 16 columns

In [34]:
# We don't need two unique id's.
beer_df_clean.drop(columns=['brewery_id'], inplace=True)
C:\Users\mcmco\anaconda3\lib\site-packages\pandas\core\frame.py:3997: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
In [35]:
# just noticed this odd column name.
beer_df_clean.rename(columns={'beer_beerid':'beer_id'}, inplace=True)
C:\Users\mcmco\anaconda3\lib\site-packages\pandas\core\frame.py:4133: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
In [36]:
# double check for any NaN values. 
beer_df_clean.isna().sum()
Out[36]:
beer_id                   0
beer_name                 0
beer_style                0
beer_abv                  0
review_overall            0
review_aroma              0
review_appearance         0
review_palate             0
review_taste              0
brewery_name              0
brewery_type              0
city                    169
state                   169
longitude            125764
latitude             125764
dtype: int64

169 rows have NaN values for city and state. Let's drop these rows.

But let's not worry about the longitude and latitude NaN values, because I am not sure if I will be using these columns or not yet.

In [37]:
beer_df_clean.dropna(subset=['city', 'state'], inplace=True)
C:\Users\mcmco\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
In [38]:
beer_df_clean.isna().sum()
Out[38]:
beer_id                   0
beer_name                 0
beer_style                0
beer_abv                  0
review_overall            0
review_aroma              0
review_appearance         0
review_palate             0
review_taste              0
brewery_name              0
brewery_type              0
city                      0
state                     0
longitude            125595
latitude             125595
dtype: int64
In [39]:
beer_df_clean
Out[39]:
beer_id beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563797 rows × 15 columns

Dataframe looks good.

Last thing I want to do is some feature engineering to create the variable, weighted_review, which will represent a weighted average across all 5 review categories, review_overall, review_aroma, review_appearance, review_palate, review_taste. Refer to Preliminary Wrangling for the weights used and category definitions.

This column will act as the true review rating given by each reviewer per beer. This is the same weighted average that BeerAdvocate uses for each user generated review on their website.

In [40]:
def weighted_review(x):
    return x['review_overall']*0.2 + x['review_aroma']*0.24  + x['review_appearance']*0.06 + x['review_palate']*0.1 + x['review_taste']*0.4

beer_df_clean['weighted_review'] = beer_df_clean.apply(weighted_review, axis=1)
beer_df_clean
C:\Users\mcmco\anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
Out[40]:
beer_id beer_name beer_style beer_abv review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude weighted_review
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738 4.32
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738 3.35
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738 3.75
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738 3.01
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738 3.68
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255 3.46
563962 58078 Shipwrecked Stout American Stout 6.5 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255 3.31
563963 58078 Shipwrecked Stout American Stout 6.5 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255 3.75
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255 3.63
563965 69416 Belgian Dubbel Dubbel 6.0 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255 3.27

563797 rows × 16 columns

In [41]:
# rearrange columns
beer_df_clean = beer_df_clean[['beer_id', 'beer_name', 'beer_style', 'beer_abv', 'weighted_review', 'review_overall',
                               'review_aroma', 'review_appearance', 'review_palate', 'review_taste', 'brewery_name', 
                               'brewery_type', 'city', 'state','longitude', 'latitude']]
beer_df_clean
Out[41]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563797 rows × 16 columns

What is the structure of your dataset?

  • After a bit of cleaning, our beer dataset is has 563,797 rows, across 16 columns.
  • The dataset includes 16 features, with a mixture of quantitative and qualitative features.
  • The values for each feature are the correct data type for our analysis. All numeric values are either floats or integers, and all categorical values are objects/strings.
  • A final note to make is that there are a number of beers that have multiple reviews. Therefore we will keep this in mind when doing bivariate and multivariate analysis. For example, if we want to create a bar chart ranking the top 10 highest rated beers, we will need to aggregate the ratings for each beer before plotting.
  • Additional brewery data was also merged into the main dataframe beer_df_clean from the Open Brewery Database. The variables we pulled from this database included brewery_type, city, state, longitude, and latitude.
  • Lastly, we have one engineered feature weighted_review, which is a weighted average of the 5 review categories that were in the original dataframe beer_df.

What is/are the main feature(s) of interest in your dataset?

CHANGE THIS STUFF BELOW The aim of the analysis for this dataset is to find out what features have the highest correlation with the overall review score of beers. I am most curious to see if there is a specific brewery, brewery location, or beer style that receives higher than average ratings. I am also curious to see how alcohol content in beer is correlated with overall rating. Fianlly, I am also interested to see which review sub-category: aroma, appearance, palate, or taste, is most highly correlated with overall rating.

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

I expect the sub-category ratings to be most highly correlated with the overall rating of each beer, however there's really no saying which one will have the highest correlation. But I do believe that all of their correlations will be realtively similar, and no one category will stand out. For me, I am most interested in seeing what insights we can find in the categorical features: brewery, state, brewery type, and beer style.


Univariate Exploration

First we'll do some exploration of the quantitative variables in the dataset.

We'll take a look at: beer_abv, review_overall, review_aroma, review_appearance, review_palate, and review_taste.

In [42]:
beer_df_clean
Out[42]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563797 rows × 16 columns

In [43]:
# beer_abv distribution
sns.histplot(data=beer_df_clean, x='beer_abv');

Alright so it looks like we have some outliers for beer_abv. Let's look into these..

In [44]:
abv_stats = beer_df_clean.beer_abv.describe()
abv_stats
Out[44]:
count    563797.000000
mean          7.397345
std           2.532279
min           0.050000
25%           5.500000
50%           7.000000
75%           9.000000
max          41.000000
Name: beer_abv, dtype: float64

I am going to define outliers using the 1.5 x IQR rule.

In [45]:
# Find limits for the 1.5xIQR rule. 
IQR = abv_stats[6] - abv_stats[4]
lower_limit = abv_stats[4] - (1.5 * IQR)
upper_limit = abv_stats[6] + (1.5 * IQR)
print(f'Outlier Lower Limit = {lower_limit} \n\
Outlier Upper Limit = {upper_limit}')
Outlier Lower Limit = 0.25 
Outlier Upper Limit = 14.25

Now that we have our lower and upper limits to define the outliers, we will use these values to filter our data and then re-plot the distribution.

In [46]:
beer_df_clean.query("beer_abv < 0.25")
Out[46]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
487984 36162 Budweiser NA Low Alcohol Beer 0.05 1.57 1.5 1.5 3.5 1.0 1.5 Anheuser-Busch large Saint Louis Missouri -90.2119 38.595454
487985 36162 Budweiser NA Low Alcohol Beer 0.05 2.45 3.0 2.5 2.5 3.0 2.0 Anheuser-Busch large Saint Louis Missouri -90.2119 38.595454
In [47]:
beer_df_clean.query("beer_style == 'Low Alcohol Beer'").count()['beer_abv']
Out[47]:
277
In [48]:
beer_df_clean.query("beer_style == 'Low Alcohol Beer'")['brewery_name'].unique()
Out[48]:
array(['Anheuser-Busch'], dtype=object)

Alright so this is interesting. There are only two data points that lie below the lower limit of 0.25. We can also see that beer_style == Low Alcohol Beer. There are also 277 other data points with this beer style. However, they are all brewed by the same brewery, Anheuser-Busch.

Since low alcoholic beer has a very specific target market, it doesn't really fit within the purposes of this analysis. The purpose of this analysis is to gain insight into different beers for people who enjoy drinking it, without any thought or worry of its alcohol content. In other words, alcohol content isn't a deciding factor for choosing to drink a specific beer. So it doesn't make sense to include beers that are for people who do pay attention to alcohol content, and for them it is a deciding factor.

Having said that, I will drop all Low Alcohol Beer data from the dataset.

In [49]:
beer_df_clean = beer_df_clean.query("beer_style != 'Low Alcohol Beer'")
beer_df_clean
Out[49]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563520 rows × 16 columns

In [50]:
# check if rows were dropped. 
beer_df_clean.query("beer_style == 'Low Alcohol Beer'")
Out[50]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude

Ok, so we've taken care of the outliers below the lower limit, let's take a look at outlier above the upper limit.

In [51]:
beer_df_clean.query("beer_abv > 14.25")
Out[51]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
60234 32963 XXX Warlock Double Imperial Stout American Double / Imperial Stout 18.4 4.28 4.0 4.0 4.5 4.5 4.5 Bristol Brewing Company micro Colorado Springs Colorado -104.827401 38.810862
60247 32963 XXX Warlock Double Imperial Stout American Double / Imperial Stout 18.4 2.52 2.0 4.0 1.0 1.0 2.5 Bristol Brewing Company micro Colorado Springs Colorado -104.827401 38.810862
60248 32963 XXX Warlock Double Imperial Stout American Double / Imperial Stout 18.4 2.00 1.5 3.0 3.0 2.0 1.5 Bristol Brewing Company micro Colorado Springs Colorado -104.827401 38.810862
60249 32963 XXX Warlock Double Imperial Stout American Double / Imperial Stout 18.4 1.76 1.0 2.0 3.0 3.0 1.5 Bristol Brewing Company micro Colorado Springs Colorado -104.827401 38.810862
60250 32963 XXX Warlock Double Imperial Stout American Double / Imperial Stout 18.4 4.10 4.5 4.0 4.0 4.0 4.0 Bristol Brewing Company micro Colorado Springs Colorado -104.827401 38.810862
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
543727 56134 Barrel Of Monkeys American Barleywine 16.9 3.32 3.0 3.5 3.0 3.0 3.5 Devil's Canyon Brewing Company micro San Carlos California -122.244541 37.498966
562715 66003 Iced Coffee Stout American Double / Imperial Stout 15.0 4.47 4.5 4.5 4.0 4.5 4.5 Hill Farmstead Brewery micro Greensboro Bend Vermont NaN NaN
562716 66003 Iced Coffee Stout American Double / Imperial Stout 15.0 4.50 4.5 4.5 4.5 4.5 4.5 Hill Farmstead Brewery micro Greensboro Bend Vermont NaN NaN
562717 66003 Iced Coffee Stout American Double / Imperial Stout 15.0 4.36 4.5 4.0 5.0 4.0 4.5 Hill Farmstead Brewery micro Greensboro Bend Vermont NaN NaN
562718 66003 Iced Coffee Stout American Double / Imperial Stout 15.0 4.72 4.5 4.5 4.0 5.0 5.0 Hill Farmstead Brewery micro Greensboro Bend Vermont NaN NaN

10319 rows × 16 columns

Ok, so over 10,000 data points can be defined as an "outlier". However, we really shouldn't drop these values just because they are outliers. So let's look into some of these data points in more detail.

In [52]:
#how many data points do we have for each abv value > 14.25?
beer_df_clean.query("beer_abv > 14.25").beer_abv.value_counts().sort_index()
Out[52]:
14.50    1710
15.00      54
15.04     871
15.30      10
15.43      29
15.60      20
15.90      94
16.00      16
16.03      33
16.45     495
16.80     693
16.83     506
16.90       3
17.00       8
17.20       9
17.50     605
18.00    4276
18.10      40
18.20     184
18.40      68
21.00      22
24.00      20
25.00      31
27.00     355
28.00       3
32.00      88
41.00      76
Name: beer_abv, dtype: int64

It doesn't look like there are any values that are completely outlandish. There is a fairly consistent progression of values.

Let's take a look at the highest abv value, 41, and make sure that it is actually a real beer.

In [53]:
beer_df_clean.query("beer_abv == 41")
Out[53]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
307061 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 3.83 4.0 3.5 4.0 3.5 4.0 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307062 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 3.68 3.5 4.0 4.5 3.5 3.5 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307070 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 3.31 3.5 3.0 4.0 4.5 3.0 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307094 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 1.24 1.0 1.5 3.0 1.0 1.0 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307095 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 2.70 5.0 2.0 2.0 1.0 2.5 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
307165 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 4.12 4.0 4.5 4.0 4.0 4.0 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307166 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 4.57 4.5 5.0 4.5 4.0 4.5 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307167 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 4.62 3.5 5.0 4.5 4.5 5.0 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307168 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 4.70 4.5 4.5 4.5 4.5 5.0 BrewDog micro Canal Winchester Ohio -82.831135 39.822595
307169 57015 Sink The Bismarck! American Double / Imperial IPA 41.0 4.57 4.5 5.0 4.5 4.0 4.5 BrewDog micro Canal Winchester Ohio -82.831135 39.822595

76 rows × 16 columns

It looks like, Sink The Bismark! is a real beer, and was the strongest beer in the world when it was released.

So since there isn't a good reason to drop these values, other than that they are outliers, I think the best course of action would be to keep the values in our dataset, and just change the binrange for the distribution.

In addition, when we get to bivariate and multivariate analysis, it might be a good idea to do show results with and without these outliers.

The last thing we need to do before re-plotting our data is take into account beers with multiple reviews. This is important because beers with multiple reviews have the same beer_abv value, and are getting counted multiple times, which could be skewing the data.

Therefore, what we need to do is filter out any beers that have multiple reviews, and just plot a single beer_abv value for each one. Let's do that now.

In [54]:
abv_unique = beer_df_clean.drop_duplicates(subset=['beer_name', 'beer_style', 'beer_abv', 'brewery_name', 'city', 'state'])
abv_unique
Out[54]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
10 10789 Caldera Oatmeal Stout Oatmeal Stout 7.2 2.97 3.0 3.0 2.5 3.0 3.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
12 12386 Caldera OBF 15 American Pale Lager 5.6 3.76 4.0 3.0 4.0 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
13 58046 Rauch Ür Bock Rauchbier 7.4 4.36 4.5 4.5 3.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563955 46681 PB Porter American Porter 6.5 3.50 3.5 3.5 3.5 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563959 58076 Amber Wave American Amber / Red Ale 5.4 3.29 4.0 3.5 2.5 3.0 3.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

11095 rows × 16 columns

This dataframe, abv_unique, will now only count a single beer_abv value for each unique beer is beer_df_clean.

We can now go ahead and use abv_unique to plot the distribution.

In [55]:
# re-plot distribution for beer_abv.
sns.histplot(data=abv_unique, x='beer_abv', binwidth=0.5, binrange=(2,14));

Here, I set the binrange to the outlier range we caluclated before. Smaller bin sizes didn't give any more useful insight. Slight right skew with most data points in the 4 to 10 abv range.

Next, lets deal with the right skew of this distribution.

First, I want to take a look at a few different transformations and compare their skew. This will give us a general sense of which transformation is best to use.

In [56]:
# skew of original data. 
abv_unique.beer_abv.skew()
Out[56]:
1.785754430729601

This is the skew of the beer_abv column in abv_unique. Let's apply some transformations and see if we can get that skew value closer to 0. Since we are dealing with a right skew, we will compare roots and log transformations.

In [57]:
print(f'''square root transformation: {np.sqrt(abv_unique.beer_abv).skew()}
cube root transformation: {np.cbrt(abv_unique.beer_abv).skew()}
log transformation: {np.log10(abv_unique.beer_abv).skew()}
''')
square root transformation: 0.8709065751970357
cube root transformation: 0.6682802085586348
log transformation: 0.30037268551325036

Log transformation looks like the way to go here.

In [58]:
log_binsize = 0.035
bins = 10 ** np.arange(0.3, 1.3 + log_binsize, log_binsize)
sns.histplot(data=abv_unique, x='beer_abv', bins=bins)
plt.xscale('log')
tick_locs = [2, 5, 10, 20]
plt.xticks(tick_locs, tick_locs);

This looks a bit better now.


Now let's take a look at our review scores: weighted_review, review_overall, review_aroma, review_appearance, review_palate, and review_taste

First I want to take a look at some general details about these variables before plotting. We probably should of done this in the first section, but oh well.

In [59]:
beer_df_clean[['weighted_review', 'review_overall', 'review_aroma',
               'review_appearance', 'review_palate', 'review_taste']].describe()
Out[59]:
weighted_review review_overall review_aroma review_appearance review_palate review_taste
count 563520.000000 563520.000000 563520.000000 563520.000000 563520.000000 563520.000000
mean 3.860928 3.878524 3.826135 3.903313 3.820094 3.876857
std 0.594069 0.694011 0.665907 0.581781 0.650815 0.702981
min 1.000000 0.000000 1.000000 0.000000 1.000000 1.000000
25% 3.580000 3.500000 3.500000 3.500000 3.500000 3.500000
50% 3.970000 4.000000 4.000000 4.000000 4.000000 4.000000
75% 4.250000 4.500000 4.000000 4.000000 4.000000 4.500000
max 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000

So it looks like the variables review_overall and review_appearance have a min review score of 0. This must be an input error since the lowest review score you can give on BeerAdvocate is 1. Therefore, let's drop any rows that have review scores of 0.

In [60]:
review_zero = beer_df_clean.query('review_overall == 0')
review_zero
Out[60]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
47471 3806 Pub Pils Czech Pilsener 4.0 1.58 0.0 2.0 0.0 3.0 2.0 Desert Edge Brewery brewpub Salt Lake City Utah -111.875021 40.756391
47505 3804 Latter Days Stout Irish Dry Stout 4.0 1.96 0.0 4.0 0.0 2.0 2.0 Desert Edge Brewery brewpub Salt Lake City Utah -111.875021 40.756391
47523 3810 Utah Pale Ale American Pale Ale (APA) 4.0 1.72 0.0 3.0 0.0 2.0 2.0 Desert Edge Brewery brewpub Salt Lake City Utah -111.875021 40.756391
In [61]:
beer_df_clean.drop(index=review_zero.index, inplace=True)
C:\Users\mcmco\anaconda3\lib\site-packages\pandas\core\frame.py:3997: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
In [62]:
# check that rows were dropped.
beer_df_clean.query('review_overall == 0')
Out[62]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude

Now with that taken care of, let's start with plotting the distribution for weighted_review, which again is a weighted average score for the other 5 review categories: review_overall, review_aroma, review_appearance, review_palate, and review_taste.

In [63]:
sns.histplot(data=beer_df_clean, x='weighted_review', binwidth=0.1)
plt.title('Weighted Average Review Score Distribution')
plt.xlabel('Review score');

Unimodal distribution with a left skew, sweet. Let's move onto the other 5 review categories.

In [64]:
fig, (ax1, ax2, ax3) = plt.subplots(3,2, figsize=(12,15)) 

sns.histplot(data=beer_df_clean, x='review_overall', ax=ax1[0], binwidth=0.5)
ax1[0].set_title('Overall Review Score Distribution')
ax1[0].set_xlabel('Review score')

sns.histplot(data=beer_df_clean, x='review_aroma', ax=ax1[1], binwidth=0.5)
ax1[1].set_title('Aroma Review Score Distribution')
ax1[1].set_xlabel('Review score')

sns.histplot(data=beer_df_clean, x='review_appearance', ax=ax2[0], binwidth=0.5)
ax2[0].set_title('Appearance Review Score Distribution')
ax2[0].set_xlabel('Review score')

sns.histplot(data=beer_df_clean, x='review_palate', ax=ax2[1], binwidth=0.5)
ax2[1].set_title('Palate Review Score Distribution')
ax2[1].set_xlabel('Review score')

sns.histplot(data=beer_df_clean, x='review_taste', ax=ax3[0], binwidth=0.5)
ax3[0].set_title('Taste Review Score Distribution')
ax3[0].set_xlabel('Review score')

ax3[1].axis('off')
plt.tight_layout()

The distributions for the these sub-category review scores have a similar distribution and skew to the weighted review scores.

This tells us a couple of things. One, these sub-categories will likely have a strong correlation with each other. Two, we will most likely want to apply the same transformation to each of these distributions. A sqaured or cubed transformation would make sense since we are dealing with a left skew.

We will most likely have to use matplotlib to create our own scale function, since it doesn't have a squared or cubed scale built in.

Another thing to note here is the binsize change. By chance, every review score in the dataset was given in an interval of 0.5, i.e. 0.5, 1.0, 1.5, etc. However, on BeerAdvocate a reviewer can give any score between 1-5, including decimals. Because of this, I have been treating these 5 review variables as quantitative variables, even though their is an argument to say that they are qualitative.

Next, let's compare the skew of squared and cubed transformations for each review category, and then will work on the scale transformation.

In [65]:
# Overall review skew comparison.
print(f'''Weighted Average Reviews
------------------
original data skew: {beer_df_clean.weighted_review.skew()}
squared transformation: {np.square(beer_df_clean.weighted_review).skew()}
cubed transformation: {np.power(beer_df_clean.weighted_review, 3).skew()}

Overall Reviews
------------------
original data skew: {beer_df_clean.review_overall.skew()}
squared transformation: {np.square(beer_df_clean.review_overall).skew()}
cubed transformation: {np.power(beer_df_clean.review_overall, 3).skew()}

Aroma Reviews
------------------
original data skew: {beer_df_clean.review_aroma.skew()}
squared transformation: {np.square(beer_df_clean.review_aroma).skew()}
cubed transformation: {np.power(beer_df_clean.review_aroma, 3).skew()}

Appearance Reviews
------------------
original data skew: {beer_df_clean.review_appearance.skew()}
squared transformation: {np.square(beer_df_clean.review_appearance).skew()}
cubed transformation: {np.power(beer_df_clean.review_appearance, 3).skew()}

Palate Reviews
------------------
original data skew: {beer_df_clean.review_palate.skew()}
squared transformation: {np.square(beer_df_clean.review_palate).skew()}
cubed transformation: {np.power(beer_df_clean.review_palate, 3).skew()}

Taste Reviews
------------------
original data skew: {beer_df_clean.review_taste.skew()}
squared transformation: {np.square(beer_df_clean.review_taste).skew()}
cubed transformation: {np.power(beer_df_clean.review_taste, 3).skew()}
''')
Weighted Average Reviews
------------------
original data skew: -1.1984672927072069
squared transformation: -0.5047808852203373
cubed transformation: -0.01443767612316318

Overall Reviews
------------------
original data skew: -1.1060441322918542
squared transformation: -0.34339252456404773
cubed transformation: 0.1800702843091319

Aroma Reviews
------------------
original data skew: -0.8914884091128868
squared transformation: -0.18199327912184646
cubed transformation: 0.3183647592576002

Appearance Reviews
------------------
original data skew: -0.9742304877643074
squared transformation: -0.2373280150245303
cubed transformation: 0.2849302728884694

Palate Reviews
------------------
original data skew: -0.9445654223236041
squared transformation: -0.22059641406222286
cubed transformation: 0.3025889741414878

Taste Reviews
------------------
original data skew: -1.0834602672817597
squared transformation: -0.3509271551865234
cubed transformation: 0.1534563848648996

Based on these skew values a cubed transformation looks like the best option across all review categories.

So let's just go ahead and create a cubed scale for the x-axis of the review distributions. Matplotlib doesn't have a 'cubed' option for plt.xscale, but we can use the 'function' option of set_xscale to apply our own function. Pretty cool.

Before we apply a cubed scale to the review distributions, I want to test the code out on some dummy data.

In [66]:
from matplotlib.ticker import FixedLocator

# define cubed function to apply to scale. 
def forward(x):
    return x**3


def inverse(x):
    return np.sign(x) * (np.abs(x)) ** (1 / 3)


# plot dummy data using defined cubed function. 
dummy_data = np.cbrt(np.arange(0,9))
fig, ax = plt.subplots(1,1, figsize=(6,5))
ax.plot(dummy_data, label='$\sqrt[3]{x}$')
ax.set_yscale('function', functions=(forward, inverse))
ax.yaxis.set_major_locator(FixedLocator(np.arange(0, 10, 1)**(1/3)))
ax.legend(fontsize=20);
In [67]:
# plotted y-values to help with interpretation. 
np.cbrt(np.arange(0,9))
Out[67]:
array([0.        , 1.        , 1.25992105, 1.44224957, 1.58740105,
       1.70997595, 1.81712059, 1.91293118, 2.        ])

That worked nicely.

We purposefully plotted a cubed root function so that the resulting plot would be linear after applying the cubed function to the y-scale. It is much easier to interpret what is going on with this visualization.

If you take a look at the plot without the y-scale function applied, you can see the effect of the cubed function.

In [68]:
dummy_data = np.cbrt(np.arange(0,9))
fig, ax = plt.subplots(1,1, figsize=(6,5))
ax.plot(dummy_data);

Now, let's try to apply this function to the review distributions.

In [69]:
from matplotlib.ticker import FixedLocator

# define cubed function to apply to scale. 
def forward(x):
    return x**3


def inverse(x):
    return np.sign(x) * (np.abs(x)) ** (1 / 3)


binsize = 5
bins = np.arange(0, 130, binsize)**(1/3)

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(15,6))

sns.histplot(beer_df_clean, x='weighted_review', bins=bins, ax=ax1)
ax1.set_xscale('function', functions=(forward, inverse))
ax1.xaxis.set_major_locator(FixedLocator(np.arange(0, 130, 25)**(1/3)))
ax1.set_title('FixedLocator Ticks')
ax1.set_xlabel('Review score')

sns.histplot(beer_df_clean, x='weighted_review', bins=bins, ax=ax2)
ax2.set_xscale('function', functions=(forward, inverse))
tick_locs1 = [1, 2, 2.5, 3, 3.5, 4, 4.5, 5]
plt.xticks(tick_locs1, tick_locs1)
ax2.set_title('True Value Ticks')
ax2.set_xlabel('Review score')

plt.suptitle('Weighted Average Review Score Distribution', fontsize=15);

Original distribution for comparison. image.png

Right away, we can see that the cubed transformation does reduce the skew of the distribution.

I included two plots of the transformation for comparison:

  • ax1: uses the matplotlib FixedLocator subclass from the ticker module to set tick marks. This evenly distributes the tick marks across the bins of the distribution. It is more aesthetically pleasing, however it can cause some difficultly in interpretation, since our review scores are in intervals of 0.5.
  • ax2: uses plt.xticks to set tick marks that correspond to the true values of review scores, i.e. intervals of 0.5. The spacing of the tick marks on this plot can also make it difficult to interpret. However, we get the benefit of visualizing values that make sense to us.

Now we can apply the cubed transformation to the rest of the review categories.

In [70]:
fig, axes = plt.subplots(5,2, figsize=(12,25), sharey=True)
categories = ['review_overall', 'review_aroma', 'review_appearance', 'review_palate', 'review_taste']

for i, row in enumerate(axes):
    binsize = 25
    bins = np.arange(0, 130, binsize)**(1/3)
    
    # FixedLocator ticks plot
    sns.histplot(beer_df_clean, x=categories[i], bins=bins, ax=row[0])
    row[0].set_xscale('function', functions=(forward, inverse))
    row[0].xaxis.set_major_locator(FixedLocator(np.arange(0, 130, 25)**(1/3)))
    row[0].set_xlabel('Review Score')
    row[0].set_title('FixedLocator Ticks')
    
    # True value ticks plot
    sns.histplot(beer_df_clean, x=categories[i], bins=bins, ax=row[1])
    row[1].set_xscale('function', functions=(forward, inverse))
    tick_locs = [1, 2, 2.5, 3, 3.5, 4, 4.5, 5]
    row[1].set_xticks(tick_locs)
    row[1].set_xticklabels(tick_locs)
    row[1].set_xlabel('Review Score')
    row[1].set_title('True Value Ticks')

# plt.subplots_adjust(hspace=3)
plt.figtext(0.38, 1, 'Overall Review Score Distribution', fontsize=15, weight=20)
plt.figtext(0.38, 0.8, 'Aroma Review Score Distribution', fontsize=15, weight=20)
plt.figtext(0.38, 0.6, 'Appearance Review Score Distribution', fontsize=15, weight=20)
plt.figtext(0.38, 0.4, 'Palate Review Score Distribution', fontsize=15, weight=20)
plt.figtext(0.38, 0.19, 'Taste Review Score Distribution', fontsize=15, weight=20)
plt.tight_layout(h_pad=5)

Alright, so now we have addressed the skew of each review distribution. Keep in mind that normalizing the distributions like this isn't always necessary for your analysis. However, I figured that it would be good practice, and it may be helpful later on.


Lastly, I want to take a look at a few qualitative variables in our dataset.

I want to take a look at the distributions of states where each beer is brewed, beer name, beer style, and brewery type.

In [71]:
# Let's see how many different states we are working with. 
beer_df_clean.state.nunique()
Out[71]:
51

Alright, we have all 50 states and D.C. represented in the dataset. It will be a tight squeeze, but I think it will be worth looking at the full distribution.

We also have to remember that each brewery may brew multiple beers, and each beer may have multiple reviews. Therfore each state will be counted multiple times. To deal with this, we can drop all brewery duplicates, and then plot the states where those breweries are located.

In [72]:
state_unique = beer_df_clean.drop_duplicates(subset=['brewery_name', 'brewery_type', 'city', 'state'])
state_unique
Out[72]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
886 55097 Nut Brown English Brown Ale 4.7 3.63 4.0 3.5 4.0 3.5 3.5 Broad Ripple Brew Pub brewpub Indianapolis Indiana NaN NaN
1015 33624 Hoppin' To Heaven IPA American IPA 6.8 4.00 4.0 4.0 4.0 4.0 4.0 Hoppin' Frog Brewery micro Akron Ohio NaN NaN
4137 53679 Ballz Deep Double IPA American Double / Imperial IPA 8.4 4.00 4.0 4.0 4.0 4.0 4.0 7 Seas Brewery and Taproom micro Tacoma Washington -122.439038 47.242614
4179 48911 Founders Cerise Fruit / Vegetable Beer 6.5 4.35 4.5 4.0 4.0 4.5 4.5 Founders Brewing Company regional Grand Rapids Michigan -85.676575 42.952496
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563834 74835 Pale Ale American Pale Ale (APA) 6.3 3.97 4.0 4.0 3.5 4.0 4.0 The Bronx Brewery micro Bronx New York -73.910641 40.801861
563836 60269 Orkney Porter English Porter 9.0 4.35 4.0 4.5 4.5 4.0 4.5 Highland Brewing Company Ltd. regional Asheville North Carolina NaN NaN
563855 76106 Stone's Throw Scottish Ale Scottish Ale 4.5 3.41 3.5 3.0 4.0 3.5 3.5 Fargo Brewing Company micro Fargo North Dakota NaN NaN
563861 56470 Libertine Witbier 7.0 4.18 4.5 3.5 4.0 4.0 4.5 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563939 73587 Hopulent American Double / Imperial IPA 8.5 2.13 1.5 3.0 3.5 3.0 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

583 rows × 16 columns

In [73]:
state_unique.state.value_counts()
Out[73]:
California              81
Oregon                  34
Washington              31
New York                31
Michigan                29
Colorado                25
Pennsylvania            22
Wisconsin               21
North Carolina          20
Missouri                19
Massachusetts           18
Illinois                18
Minnesota               16
Florida                 15
Indiana                 15
Montana                 15
Texas                   14
Ohio                    14
Virginia                11
Arizona                  9
Maryland                 9
New Jersey               7
Maine                    7
South Carolina           6
Nebraska                 6
Nevada                   6
New Hampshire            5
Georgia                  5
Vermont                  5
Kentucky                 5
Hawaii                   4
Alabama                  4
Idaho                    4
New Mexico               4
Iowa                     4
Utah                     4
Rhode Island             4
Tennessee                4
Oklahoma                 4
Alaska                   4
Wyoming                  3
Louisiana                3
District of Columbia     3
Kansas                   3
South Dakota             2
Connecticut              2
Arkansas                 2
West Virginia            2
Delaware                 2
Mississippi              1
North Dakota             1
Name: state, dtype: int64
In [74]:
# still dealing with 51 states?
state_unique.state.nunique()
Out[74]:
51

Now, we can go ahead and plot this values.

In [75]:
plt.figure(figsize=(12,14))
sns.countplot(data=state_unique, y='state', palette='Set3');

Let's now try to put these in descending order.

In [76]:
state_order = state_unique.state.value_counts().index
plt.figure(figsize=(12,14))
sns.countplot(data=state_unique, y='state', palette='Blues_r', order=state_order);

Last thing I want to do is to annotate the value counts for each state on the plot.

In [77]:
plt.figure(figsize=(12,14))
ax = sns.countplot(data=state_unique, y='state', palette='Blues_r', order=state_order)
plt.xlim(0,90)
plt.title('Most Popular Beer States')
sns.despine();

for p in ax.patches:
    bar_end = p.get_width()
    x = p.get_x() + p.get_width() + 0.3
    y = p.get_y() + p.get_height() / 1.35
    ax.annotate(format(bar_end, ',d'), (x,y))

Alright this looks better, and it gives us an idea of where a majority of breweries are located in the U.S., which may have some indication on the popularity of beer in that state.

We can see that California is a clear standout, with states like Oregon, New York, Washington, and Michigan also having a high number of breweries. It will be interesting to see which of these states also produce highly rated beers.


Now let's take a look at the most popular beer styles in the dataset.

With beer styles, we face the same problem of duplicates as we did with brewery states. Since each beer can have multiple reviews, beer styles may be counted multiple times for a single beer. Therefore, in order to get accurate counts for our plot, we will need to drop these duplicates.

In [78]:
style_unique = beer_df_clean.drop_duplicates(subset=['beer_name', 'beer_style', 'brewery_name'])
style_unique
Out[78]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
10 10789 Caldera Oatmeal Stout Oatmeal Stout 7.2 2.97 3.0 3.0 2.5 3.0 3.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
12 12386 Caldera OBF 15 American Pale Lager 5.6 3.76 4.0 3.0 4.0 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
13 58046 Rauch Ür Bock Rauchbier 7.4 4.36 4.5 4.5 3.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563955 46681 PB Porter American Porter 6.5 3.50 3.5 3.5 3.5 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563959 58076 Amber Wave American Amber / Red Ale 5.4 3.29 4.0 3.5 2.5 3.0 3.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

10221 rows × 16 columns

Looks good, but its possible that we are missing beer style counts where 2 or more breweries have the same name, but are seperate breweries. Either because they are different brewery types, or are located in different cities/states.

So we will include breweries in our subset columns just to cover these edge cases.

In [79]:
style_unique = beer_df_clean.drop_duplicates(subset=['beer_name', 'beer_style', 
                                                     'brewery_name', 'brewery_type', 'city', 'state'])
style_unique
Out[79]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
10 10789 Caldera Oatmeal Stout Oatmeal Stout 7.2 2.97 3.0 3.0 2.5 3.0 3.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
12 12386 Caldera OBF 15 American Pale Lager 5.6 3.76 4.0 3.0 4.0 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
13 58046 Rauch Ür Bock Rauchbier 7.4 4.36 4.5 4.5 3.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563955 46681 PB Porter American Porter 6.5 3.50 3.5 3.5 3.5 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563959 58076 Amber Wave American Amber / Red Ale 5.4 3.29 4.0 3.5 2.5 3.0 3.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

11180 rows × 16 columns

Ok, so it looks like our worries were justified since the number of rows returned increased by almost 1,000 to 11,180.

Now, let's go ahead and plot.

In [80]:
pd.Series(style_unique.beer_style.value_counts().values).describe()
Out[80]:
count    101.000000
mean     110.693069
std      135.536249
min        1.000000
25%       27.000000
50%       70.000000
75%      126.000000
max      854.000000
dtype: float64

Conveniently there are 101 unique beer styles in style_unique. So, we'll take a look at the upper quartile, which is about the top 25 beer styles.

In [81]:
# find beer styles in upper quartile. 
top25_styles = style_unique.beer_style.value_counts().head(25).index
top25_styles
Out[81]:
Index(['American IPA', 'American Pale Ale (APA)', 'American Porter',
       'American Amber / Red Ale', 'American Double / Imperial IPA',
       'Saison / Farmhouse Ale', 'American Double / Imperial Stout',
       'American Stout', 'American Brown Ale', 'Fruit / Vegetable Beer',
       'American Barleywine', 'American Strong Ale', 'American Wild Ale',
       'American Pale Wheat Ale', 'Hefeweizen', 'Russian Imperial Stout',
       'American Blonde Ale', 'Witbier', 'Belgian Pale Ale',
       'Extra Special / Strong Bitter (ESB)', 'Tripel',
       'Scotch Ale / Wee Heavy', 'Märzen / Oktoberfest', 'Oatmeal Stout',
       'Belgian Strong Dark Ale'],
      dtype='object')
In [82]:
style_unique.beer_style.value_counts().head(25)
Out[82]:
American IPA                           854
American Pale Ale (APA)                641
American Porter                        462
American Amber / Red Ale               453
American Double / Imperial IPA         401
Saison / Farmhouse Ale                 369
American Double / Imperial Stout       356
American Stout                         294
American Brown Ale                     281
Fruit / Vegetable Beer                 276
American Barleywine                    250
American Strong Ale                    242
American Wild Ale                      241
American Pale Wheat Ale                217
Hefeweizen                             203
Russian Imperial Stout                 193
American Blonde Ale                    193
Witbier                                184
Belgian Pale Ale                       172
Extra Special / Strong Bitter (ESB)    160
Tripel                                 147
Scotch Ale / Wee Heavy                 145
Märzen / Oktoberfest                   140
Oatmeal Stout                          139
Belgian Strong Dark Ale                133
Name: beer_style, dtype: int64
In [83]:
plt.figure(figsize=(9,10))
ax = sns.countplot(data=style_unique, y='beer_style', order=top25_styles, palette='Blues_r')
plt.xlim(0, 900)
plt.title('Top 25 Most Popular Styles of Beer')
sns.despine();

for p in ax.patches:
    bar_end = p.get_width()
    x = p.get_x() + p.get_width() + 5
    y = p.get_y() + p.get_height() / 1.4
    ax.annotate(format(bar_end, ',d'), (x,y))

Looks good.

Next, let's move onto types of breweries.


First I want to define each brewery type, as defined by Brewer's Association:

  • Large: A brewery with an annual beer production greater than 6,000,000 barrels.
  • Regional: A brewery with an annual beer production of between 15,000 and 6,000,000 barrels.
  • Micro: A brewery that produces less than 15,000 barrels of beer per year and sells 75 percent or more of its beer off-site.
  • Brewpub: A restaurant-brewery that sells 25 percent or more of its beer on-site and operates significant food services.
  • Contract: A business that hires another brewery to produce its beer. It can also be a brewery that hires another brewery to produce additional beer.
  • Proprietor: A licensed tenant brewery that physically takes possession of a shared brewery while brewing.
  • Planning: Any independent or home brewery that is planning to expand their brewery into a brewpub, microbrewery, etc.

Next, let's filter for unique breweries so that we aren't recounting brewery types. To insure that we don't run into a similar problem like we did with beers types, where several beers have the same name and beer style combination, we will filter for brewery types with a unique, brewery_name, brewery_type, city, and state.

Filtering with these subset columns will make sure that we only count brewery types with a unique brewery name, in a unique city, and in a unique state. This maybe to a bit overkill, however it will deal with certain edge cases, for example:

  • Two breweries named 'X brewery', are both brewpubs, and are both located in New York.
  • However, one of the breweries is in NYC, New York, and the other is in Albany, New York.
  • In this case, we want this to be counted as 2 brewpubs. Having the four subset columns listed above will do this for us.

With that said, let's filter and plot.

In [84]:
brewery_unique = beer_df_clean.drop_duplicates(subset=['brewery_name', 'brewery_type', 'city', 'state'])
brewery_unique
Out[84]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
886 55097 Nut Brown English Brown Ale 4.7 3.63 4.0 3.5 4.0 3.5 3.5 Broad Ripple Brew Pub brewpub Indianapolis Indiana NaN NaN
1015 33624 Hoppin' To Heaven IPA American IPA 6.8 4.00 4.0 4.0 4.0 4.0 4.0 Hoppin' Frog Brewery micro Akron Ohio NaN NaN
4137 53679 Ballz Deep Double IPA American Double / Imperial IPA 8.4 4.00 4.0 4.0 4.0 4.0 4.0 7 Seas Brewery and Taproom micro Tacoma Washington -122.439038 47.242614
4179 48911 Founders Cerise Fruit / Vegetable Beer 6.5 4.35 4.5 4.0 4.0 4.5 4.5 Founders Brewing Company regional Grand Rapids Michigan -85.676575 42.952496
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563834 74835 Pale Ale American Pale Ale (APA) 6.3 3.97 4.0 4.0 3.5 4.0 4.0 The Bronx Brewery micro Bronx New York -73.910641 40.801861
563836 60269 Orkney Porter English Porter 9.0 4.35 4.0 4.5 4.5 4.0 4.5 Highland Brewing Company Ltd. regional Asheville North Carolina NaN NaN
563855 76106 Stone's Throw Scottish Ale Scottish Ale 4.5 3.41 3.5 3.0 4.0 3.5 3.5 Fargo Brewing Company micro Fargo North Dakota NaN NaN
563861 56470 Libertine Witbier 7.0 4.18 4.5 3.5 4.0 4.0 4.5 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563939 73587 Hopulent American Double / Imperial IPA 8.5 2.13 1.5 3.0 3.5 3.0 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

583 rows × 16 columns

In [85]:
brewery_unique.brewery_type.value_counts()
Out[85]:
micro         247
brewpub       215
regional       71
contract       22
large          18
planning        7
proprietor      3
Name: brewery_type, dtype: int64
In [86]:
brewery_order = brewery_unique.brewery_type.value_counts().index
plt.figure(figsize=(9,8))
ax = sns.countplot(data=brewery_unique, y='brewery_type', order=brewery_order, palette='Blues_r')
plt.title('Most Popular Types of Breweries')
sns.despine()

for p in ax.patches:
    bar_end = p.get_width()
    x = p.get_x() + p.get_width() + 2
    y = p.get_y() + p.get_height() / 1.8
    ax.annotate(format(bar_end, ',d'), (x,y))

Alright, so we can see that a majority of breweries reviewed are micro breweries and brewpubs. This seems makes sense, since people who visit or drink beer from micro breweries and/or brewpubs are more likely to be beer enthusiasts who will have an opinion, i.e. a review, on the beers they drink.

Oh, I almost forgot that I wanted to look at the most popular beers based on number of reviews.

In [87]:
# plot top 25 most popular beers
beer_df_clean.beer_name.value_counts().head(25)
Out[87]:
Sculpin India Pale Ale                           8112
Founders Breakfast Stout                         5004
Big Eye IPA                                      4512
Founders KBS (Kentucky Breakfast Stout)          3864
Great Lakes Edmund Fitzgerald Porter             3330
90 Minute IPA                                    3290
Victory At Sea Coffee Vanilla Imperial Porter    3174
Founders Red's Rye PA                            2810
Founders Centennial IPA                          2790
Two Hearted Ale                                  2728
Green Flash West Coast I.P.A.                    2572
Founders Dirty Bastard                           2514
60 Minute IPA                                    2475
Furious                                          2448
Brooklyn Black Chocolate Stout                   2447
Bell's Hopslam Ale                               2443
Pale Ale                                         2422
Samuel Adams Boston Lager                        2418
Founders Double Trouble                          2328
Founders Imperial Stout                          2156
Founders Devil Dancer                            2132
Founders Backwoods Bastard                       2040
Racer 5 India Pale Ale                           1971
Dorado Double IPA                                1956
Great Lakes Blackout Stout                       1914
Name: beer_name, dtype: int64
In [88]:
# plot order
top25_beers_order = beer_df_clean.beer_name.value_counts().head(25).index
In [89]:
plt.figure(figsize=(9,10))
ax = sns.countplot(data=beer_df_clean, y='beer_name', order=top25_beers_order, palette='Blues_r')
plt.xlim(0, 9000)
plt.title('Top 25 Most Popular Beers')
sns.despine();

for p in ax.patches:
    bar_end = p.get_width()
    x = p.get_x() + p.get_width() + 100
    y = p.get_y() + p.get_height() / 1.5
    ax.annotate(format(bar_end, ',d'), (x,y))

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

The first quantitative variable we looked at was beer_abv. There were some outliers present that we defined using the 1.5xIQR rule. We dropped the outliers that laid below the lower bound, because they represented "Low Alcohol Beer", which aren't of interest in our analysis. No outliers were dropped above the upper bound since we still needed their review data. In addition to outliers, the beer_abv distribution had a slight right skew, which was corrected with a log transformation.
The variables, review_overall, review_aroma, review_appearance, review_palate, and review_taste all had left skews, which was corrected with a cubed transformation.

With the qualitative data, state, beer_style, brewery_type, and beer_name, we made count plots ranked from most frequent to least frequent value. For beer_style and beer_name, we only plotted the top 25 values, since these variables each had thousands of unique values.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Of the features investigated, the only operation that was necessary was the creation of temp tables (state_unique, style_unique, brewery_unique) used for the state, beer_style, and brewery_type varibales respectively. These temp tables were needed in order to prevent overcounting the frequency of each values, since our dataset has multiple reviews for each unique beer. A quick example: Beer A is an American IPA and has 30 different reviews, therefore represents 30 rows in the dataset. If we want to count the frequency of each beer stlye in the dataset, then we want to count 1 American IPA for Beer A instead of 30.


Bivariate Exploration

Let's start the bivariate exploration with looking at quantitative correlations.

In [90]:
variables = ['beer_abv', 'weighted_review', 'review_overall', 'review_aroma', 'review_appearance', 'review_palate', 'review_taste']

corr_matrix = beer_df_clean[variables].corr()
corr_matrix.style.background_gradient(cmap='Blues').set_caption('Correlation coefficients of qualitative variables.').format( '{:0.2f}')
Out[90]:
Correlation coefficients of qualitative variables.
beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste
beer_abv 1.00 0.27 0.10 0.31 0.24 0.27 0.26
weighted_review 0.27 1.00 0.86 0.83 0.62 0.80 0.95
review_overall 0.10 0.86 1.00 0.60 0.48 0.68 0.78
review_aroma 0.31 0.83 0.60 1.00 0.53 0.59 0.70
review_appearance 0.24 0.62 0.48 0.53 1.00 0.54 0.52
review_palate 0.27 0.80 0.68 0.59 0.54 1.00 0.71
review_taste 0.26 0.95 0.78 0.70 0.52 0.71 1.00

This simple pandas correaltion matrix serves its purpose. I also didn't know that pandas had a styling method, so I really wanted to try that out too.

Included in this matrix is weighted_review, which obviously has higher correlation coefficients than the other review categories since it is a function of those categories. Therefore, in terms of correlations, review_overall is the primary variable of interest. By looking at the correlation coefficients of variables in relation to review_overall, this will give us some insight into what categories influence a reviewer the most when giving their overall impression, or review_overall score, of a beer.

Since we are primarily interested in the correlation coefficients of review_overall, I think it would be useful to plot these values on their own.

In [91]:
data_corr = corr_matrix['review_overall'].drop(['review_overall', 'weighted_review']).reset_index().sort_values('review_overall', ascending=False)

plt.figure(figsize=(8,6))
ax = sns.barplot(data = data_corr, x='review_overall', y='index', palette='colorblind')
plt.title('Correlation with Overall Review', pad=15)
plt.ylabel('')
plt.xlabel('Correlation Coefficient', labelpad=10)
plt.xlim(0,0.85)
sns.despine();

for p in ax.patches:
    bar_end = p.get_width()
    x = p.get_x() + p.get_width() + 0.01
    y = p.get_y() + p.get_height() / 1.8
    ax.annotate(format(bar_end, '0.2f'), (x,y))

Here we can see that review_taste has the highest correlation with overall review score, and beer_abv has the lowest correlation with overall review score. This suggests that a beer's taste is the primary factor in determining a reviewer's overall impression of a beer. This makes sense.


Next I want to take a look at some realtionships between weighted_review and other qualitative variables, such as: beer_style, beer_name, brewery_name, state, brewery_type.

One important note to make is that any beer that is rated on BeerAdvocate needs at least 10 reviews to be ranked on the website. Therefore, we will keep that same threshold for this analysis.

In [92]:
# Are there any beer styles that have less than 10 reviews? 
x = beer_df_clean.groupby('beer_style').count().weighted_review
x[x<10]
Out[92]:
beer_style
Gueuze                 8
Japanese Rice Lager    3
Name: weighted_review, dtype: int64

Japanese Rice Lager and Gueuze are the only beer styles that have less than 10 reviews. So we'll keep this in mind, and deal with them if they has one of the higher average ratings.

Let's start plotting weighted_review vs. beer_style.

In [93]:
# top 25 rated beer styles 
top25_beer_styles = beer_df_clean.groupby('beer_style').weighted_review.agg(['count', 'mean'])\
                    .sort_values(by=['mean'], ascending=False).reset_index().head(25)
top25_beer_styles
Out[93]:
beer_style count mean
0 Lambic - Unblended 26 4.265385
1 American Double / Imperial Stout 32248 4.245603
2 Russian Imperial Stout 21126 4.151372
3 Flanders Red Ale 318 4.118711
4 Rye Beer 8219 4.097689
5 American Double / Imperial IPA 41387 4.090373
6 Eisbock 189 4.072963
7 Baltic Porter 4155 4.050171
8 Flanders Oud Bruin 860 4.045628
9 Wheatwine 2016 4.045079
10 American IPA 63167 4.041752
11 American Barleywine 9269 3.999906
12 Roggenbier 49 3.999592
13 Quadrupel (Quad) 5111 3.998581
14 American Porter 29494 3.997195
15 Scotch Ale / Wee Heavy 8245 3.991608
16 Gueuze 8 3.988750
17 Belgian IPA 3749 3.983406
18 Gose 31 3.974839
19 Kvass 75 3.974000
20 Foreign / Export Stout 936 3.964498
21 American Strong Ale 9932 3.962840
22 American Wild Ale 6362 3.960046
23 Weizenbock 2176 3.959885
24 Dortmunder / Export Lager 2933 3.959632

I'm also interested in displaying the number of reviews with each average rating, similar to a standard review site, which is why I included a count column. However, I will save that until later for explanatory visualizations.

So in fact Gueuze beer did make it to the top 25, therefore we'll have to deal with this quickly.

In [94]:
top25_beer_styles = beer_df_clean.groupby('beer_style').weighted_review.agg(['count', 'mean'])\
                    .sort_values(by=['mean'], ascending=False).reset_index()
top25_beer_styles = top25_beer_styles[top25_beer_styles['count']>9].head(25)
top25_beer_styles
Out[94]:
beer_style count mean
0 Lambic - Unblended 26 4.265385
1 American Double / Imperial Stout 32248 4.245603
2 Russian Imperial Stout 21126 4.151372
3 Flanders Red Ale 318 4.118711
4 Rye Beer 8219 4.097689
5 American Double / Imperial IPA 41387 4.090373
6 Eisbock 189 4.072963
7 Baltic Porter 4155 4.050171
8 Flanders Oud Bruin 860 4.045628
9 Wheatwine 2016 4.045079
10 American IPA 63167 4.041752
11 American Barleywine 9269 3.999906
12 Roggenbier 49 3.999592
13 Quadrupel (Quad) 5111 3.998581
14 American Porter 29494 3.997195
15 Scotch Ale / Wee Heavy 8245 3.991608
17 Belgian IPA 3749 3.983406
18 Gose 31 3.974839
19 Kvass 75 3.974000
20 Foreign / Export Stout 936 3.964498
21 American Strong Ale 9932 3.962840
22 American Wild Ale 6362 3.960046
23 Weizenbock 2176 3.959885
24 Dortmunder / Export Lager 2933 3.959632
25 Saison / Farmhouse Ale 13624 3.947253

There we go.

Let's get a dataframe to plot from.

In [95]:
beer_style_filter = top25_beer_styles.beer_style.values

top25_beerstyles_df = beer_df_clean.query("beer_style in @beer_style_filter")
top25_beerstyles_df
Out[95]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
58 10788 Pilot Rock Porter American Porter 5.8 4.70 4.5 4.5 4.5 4.5 5.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
220 10788 Pilot Rock Porter American Porter 5.8 4.05 4.0 4.0 4.0 4.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
221 10788 Pilot Rock Porter American Porter 5.8 4.22 4.5 3.5 3.0 5.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
222 10788 Pilot Rock Porter American Porter 5.8 4.30 4.5 4.0 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563948 73588 St. Sideburn (Bourbon Barrel Aged) American Strong Ale 8.5 2.01 1.5 2.5 3.5 3.0 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563955 46681 PB Porter American Porter 6.5 3.50 3.5 3.5 3.5 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563956 46681 PB Porter American Porter 6.5 3.68 4.0 3.5 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563957 46681 PB Porter American Porter 6.5 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563958 46681 PB Porter American Porter 6.5 3.60 4.0 3.5 3.5 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

265697 rows × 16 columns

In [96]:
# plot top 25 beer styles
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

sns.pointplot(data=top25_beerstyles_df, x='weighted_review', y='beer_style', order=beer_style_filter)

plt.grid(True)
plt.title('Top 25 Beer Styles by Average Review Score', fontsize=14)
plt.ylabel('Beer Styles')
plt.xlabel('Average Review Score')
plt.yticks(weight=600);

Decided to use a pointplot since we aren't dealing with any 0 values and its less noisey than a violin or box plot.

The pointplot is also nice because it gives us an indication of the number of reviews a certain style has through the confidence interval of each point. Those with larger error bars will tend to have a lower number of reviews, and vice versa.


Next let's take a look at weighted_review vs. beer_name.

In [97]:
# first filter out beers with <10 reviews.

beer_filter = beer_df_clean.groupby('beer_name').count().weighted_review
beer_filter1 = beer_filter[beer_filter<10].index
top25_beers_df = beer_df_clean.query("beer_name not in @beer_filter1")
top25_beers_df
Out[97]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
13 58046 Rauch Ür Bock Rauchbier 7.4 4.36 4.5 4.5 3.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
14 58046 Rauch Ür Bock Rauchbier 7.4 3.90 4.0 4.0 4.0 3.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
15 58046 Rauch Ür Bock Rauchbier 7.4 4.07 4.0 4.5 4.0 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
16 58046 Rauch Ür Bock Rauchbier 7.4 4.57 4.5 5.0 4.5 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
17 58046 Rauch Ür Bock Rauchbier 7.4 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563913 54635 Solidarity English Dark Mild Ale 3.8 2.81 2.5 2.5 3.5 3.0 3.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563914 54635 Solidarity English Dark Mild Ale 3.8 4.07 4.5 4.0 3.5 4.0 4.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563915 54635 Solidarity English Dark Mild Ale 3.8 4.08 5.0 3.5 4.0 4.0 4.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563916 54635 Solidarity English Dark Mild Ale 3.8 3.85 4.0 4.0 4.0 4.5 3.5 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563917 54635 Solidarity English Dark Mild Ale 3.8 4.33 4.5 4.0 4.5 4.0 4.5 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731

545227 rows × 16 columns

In [98]:
# Top 25 rated beers
top25_beers = top25_beers_df.groupby('beer_name').weighted_review.agg(['count', 'mean'])\
                    .sort_values(by=['mean'], ascending=False).reset_index().head(25)
top25_beers
Out[98]:
beer_name count mean
0 Rare Bourbon County Stout 249 4.649317
1 Green Flash Bourbon Barrel Aged Double Stout 14 4.641429
2 Founders CBS Imperial Stout 1274 4.616342
3 Three Sheets Barley Wine 12 4.600000
4 Notorius IPA3 22 4.592727
5 Sea Monster (3 Sheets Rum Oak Chip Aged) 12 4.585000
6 King Henry 98 4.573980
7 AleSmith Speedway Stout - Kopi Luwak 23 4.568261
8 Wooden Hell 76 4.553289
9 Hoppy Birthday 65 4.552154
10 Imperial Eclipse Stout - Pappy Van Winkle 49 4.550408
11 Ashy Cynic 18 4.543333
12 Royal Oil 15 4.542667
13 Galaxy Imperial Single Hop IPA 76 4.542500
14 Founders KBS (Kentucky Breakfast Stout) 3864 4.537521
15 Darkness - Bourbon Barrel Aged 30 4.532667
16 Double Sunshine IPA 85 4.531647
17 Carnie Fire 10 4.530000
18 Marshal Zhukov's Imperial Stout - Zhukov's Fin... 19 4.525263
19 The Abyss 1412 4.518746
20 Samuel Adams Millennium 22 4.517273
21 Birth Of Tragedy 29 4.515517
22 Southampton Berliner Weisse 41 4.512927
23 Russian Roulette 12 4.509167
24 Bourbon County Brand Coffee Stout 456 4.508026
In [99]:
# plot top 25 beers
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

sns.pointplot(data=top25_beers_df, x='weighted_review', y='beer_name', order=top25_beers.beer_name.values, capsize=0.4)

plt.grid(True)
plt.title('Top 25 Beers by Average Review Score', fontsize=14)
plt.ylabel('Beer Names')
plt.xlabel('Average Review Score')
plt.yticks(weight=600);

Again the confidence intervals in this plot are very useful. For example, the beer "Three Sheets Barley Wine" is ranked 3rd with a rating of 4.6, however it's error bars are fairly large. This tells us that we aren't that confident with the mean we have calculated, which is due to the fact that this beer only has 12 reviews.


weighted_review vs brewery_type

In [100]:
# Any brewery types with <10 reviews?
top_brew_type = beer_df_clean.groupby('brewery_type').weighted_review.agg(['count', 'mean'])\
                .sort_values(by=['mean'],ascending=False)
top_brew_type
Out[100]:
count mean
brewery_type
planning 10263 4.098650
brewpub 57998 3.930907
regional 364947 3.893626
micro 56474 3.800290
proprietor 117 3.681197
contract 3821 3.661306
large 69897 3.657536

Nope. So let's go straight ahead and plot.

In [101]:
plt.figure(figsize=(10,12))
sns.set_theme(style='darkgrid')

sns.pointplot(data=beer_df_clean, x='weighted_review', y='brewery_type', order=top_brew_type.index)

plt.grid(True)
plt.title('Brewery Types by Average Review Score', fontsize=14)
plt.ylabel('Brewery Type')
plt.xlabel('Average Review Score')
plt.yticks(weight=600);

For brewery types, the only type with large error bars is "propietor", with only 117 reviews of beers from propietor breweries.

It is interesting how "planning" breweries, which are essentially small home breweries, have the highest ratings. Since these are small batch breweries, it could be the case that more time and attention goes into each batch, which translates to higher reviews.

On the other end, "large" breweries have the lowest average ratings. Its possible that the opposite phenomena is happening in large breweries, when compared to "planning" breweries.


weighted_review vs. brewery_name

In [102]:
# filter and create dataframe. 
brew_filter = beer_df_clean.groupby('brewery_name').count().weighted_review
brew_filter1 = brew_filter[brew_filter<10].index
top25_brew_df = beer_df_clean.query("brewery_name not in @brew_filter1")
top25_brew_df
Out[102]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563074 rows × 16 columns

In [103]:
top25_brew = top25_brew_df.groupby('brewery_name').weighted_review.agg(['count', 'mean'])\
            .sort_values(by=['mean'],ascending=False).head(25)
top25_brew
Out[103]:
count mean
brewery_name
Hill Farmstead Brewery 1518 4.283999
Hardywood Park Craft Brewery 22 4.279545
Barley Brown's Brewpub 36 4.250833
Engine 15 Brewing Company 10 4.220000
Free State Brewing Co. 101 4.220000
Surly Brewing Company 12424 4.218210
AleSmith Brewing Company 6303 4.213232
Minneapolis Town Hall Brewery 2623 4.203317
COAST Brewing Company 969 4.193055
Boneyard Beer Company 160 4.192250
FiftyFifty Brewing Co. 617 4.190097
Alpine Beer Company 2940 4.189337
Founders Brewing Company 39910 4.185104
Anchorage Brewing Company 263 4.170266
Logsdon Farmhouse Ales 18 4.148889
Carton Brewing Company 14 4.145714
Bear Republic Brewing Co. 7816 4.145512
Jack's Abby Brewing 59 4.145254
Firestone Walker Brewing Co. 10942 4.136999
Rockmill Brewery 47 4.125319
Ballast Point Brewing Company 29712 4.114748
Lawson's Finest Liquids 600 4.114433
Golden City Brewery 35 4.111143
Highland Brewing Company Ltd. 19 4.111053
Glacier Brewhouse 100 4.107500
In [104]:
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

sns.pointplot(data=top25_brew_df, x='weighted_review', y='brewery_name', order=top25_brew.index, capsize=0.4)

plt.grid(True)
plt.title('Top 25 Breweries by Average Review Score', fontsize=14)
plt.ylabel('Brewery Names')
plt.xlabel('Average Review Score')
plt.yticks(weight=600);

weighted_review vs. state

In [105]:
# any states with <10 reviews?
top_states = beer_df_clean.groupby('state').weighted_review.agg(['count', 'mean'])\
            .sort_values(by=['mean'],ascending=False).reset_index()
top_states['mean'] = top_states['mean'].round(2)
top_states
Out[105]:
state count mean
0 Kansas 132 4.11
1 Minnesota 19380 4.10
2 Michigan 83054 4.03
3 Oregon 14889 4.03
4 Virginia 7578 4.00
5 Alabama 328 4.00
6 Illinois 21491 4.00
7 California 81833 4.00
8 Nebraska 6144 3.98
9 Maine 8982 3.98
10 Florida 7373 3.97
11 New York 38440 3.93
12 Delaware 33892 3.92
13 Georgia 109 3.92
14 South Carolina 1749 3.91
15 Alaska 3651 3.87
16 Pennsylvania 4709 3.84
17 Connecticut 253 3.82
18 Ohio 71603 3.81
19 Washington 5782 3.80
20 Colorado 28979 3.80
21 West Virginia 41 3.79
22 New Mexico 23 3.78
23 North Carolina 2531 3.78
24 Iowa 266 3.77
25 Massachusetts 19437 3.77
26 North Dakota 6 3.74
27 Maryland 19607 3.73
28 Montana 3179 3.72
29 Indiana 1290 3.72
30 New Jersey 2645 3.71
31 Idaho 1224 3.71
32 District of Columbia 59 3.70
33 Nevada 448 3.70
34 Vermont 12234 3.69
35 Arizona 797 3.68
36 Mississippi 630 3.67
37 Hawaii 4048 3.65
38 Wyoming 20 3.62
39 New Hampshire 1514 3.60
40 Wisconsin 12923 3.59
41 Rhode Island 313 3.58
42 Utah 1052 3.57
43 Kentucky 588 3.57
44 Texas 7528 3.55
45 Arkansas 276 3.52
46 South Dakota 92 3.50
47 Tennessee 237 3.49
48 Louisiana 4991 3.45
49 Oklahoma 100 3.09
50 Missouri 25067 3.07

North Dakota is the only state with less than 10 reviews, but we won't drop it so that we can at least compare all 50 states.

In [106]:
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

sns.pointplot(data=beer_df_clean, x='weighted_review', y='state', order=top_states.state)

plt.grid(True)
plt.title('States by Average Review Score', fontsize=14)
plt.ylabel('States')
plt.xlabel('Average Review Score')
plt.yticks(weight=600, fontsize=12);

Oklahoma and Missouri seem to have signifcantly lower review ratings than the rest of the states. I wonder if the "large" breweries are located in either Oklahoma or Missouri, since they have the lowest review ratings?

In [107]:
beer_df_clean.query("state == 'Missouri'").brewery_type.value_counts()
Out[107]:
large         15165
regional       7696
micro          1916
brewpub         219
contract         61
proprietor       10
Name: brewery_type, dtype: int64
In [108]:
beer_df_clean.query("state == 'Missouri'").brewery_name.value_counts()
Out[108]:
Anheuser-Busch                       15165
Boulevard Brewing Co.                 7676
O'Fallon Brewery                      1667
Charleville Vineyard Microbrewery      192
Flat Branch Pub & Brewery               67
Cathedral Square Brewery                61
Perennial Artisan Ales                  50
McCoy's Public House                    40
Augusta Brewing Company                 34
Ferguson Brewing Company                26
Prison Brews                            20
Schlafly Bottleworks                    16
Broadway Brewery & Restaurant           14
2nd Shift Brewery                       12
Morgan Street Brewery                   10
Charlie's Steak, Ribs & Ale              6
Piney River Brewing Company              4
4 Hands Brewing Co.                      4
Public House Brewing Company             3
Name: brewery_name, dtype: int64

So it looks like there about 15,000 reviews from the "large" brewery, Anheuser-Busch, which most likely is dragging down the average review score for beers brewed in Missouri, since "large" breweries also have the lowest average rating.

The last thing I want to do with this state data is create some sort of geospatial plot that might show some regional trends in average ratings. Let's give that a shot.

In [109]:
# First we need to add a state 'code' column to properly map the states in the map/
state_codes = {
    "Alabama":"AL",
    "Alaska":"AK",
    "Arizona":"AZ",
    "Arkansas":"AR",
    "California":"CA",
    "Colorado":"CO",
    "Connecticut":"CT",
    "Delaware":"DE",
    "District of Columbia":"DC",
    "Florida":"FL",
    "Georgia":"GA",
    "Hawaii":"HI",
    "Idaho":"ID",
    "Illinois":"IL",
    "Indiana":"IN",
    "Iowa":"IA",
    "Kansas":"KS",
    "Kentucky":"KY",
    "Louisiana":"LA",
    "Maine":"ME",
    "Montana":"MT",
    "Nebraska":"NE",
    "Nevada":"NV",
    "New Hampshire":"NH",
    "New Jersey":"NJ",
    "New Mexico":"NM",
    "New York":"NY",
    "North Carolina":"NC",
    "North Dakota":"ND",
    "Ohio":"OH",
    "Oklahoma":"OK",
    "Oregon":"OR",
    "Maryland":"MD",
    "Massachusetts":"MA",
    "Michigan":"MI",
    "Minnesota":"MN",
    "Mississippi":"MS",
    "Missouri":"MO",
    "Pennsylvania":"PA",
    "Rhode Island":"RI",
    "South Carolina":"SC",
    "South Dakota":"SD",
    "Tennessee":"TN",
    "Texas":"TX",
    "Utah":"UT",
    "Vermont":"VT",
    "Virginia":"VA",
    "Washington":"WA",
    "West Virginia":"WV",
    "Wisconsin":"WI",
    "Wyoming":"WY"
}
In [110]:
top_states['state_code'] = top_states.state.replace(to_replace=state_codes)
top_states
Out[110]:
state count mean state_code
0 Kansas 132 4.11 KS
1 Minnesota 19380 4.10 MN
2 Michigan 83054 4.03 MI
3 Oregon 14889 4.03 OR
4 Virginia 7578 4.00 VA
5 Alabama 328 4.00 AL
6 Illinois 21491 4.00 IL
7 California 81833 4.00 CA
8 Nebraska 6144 3.98 NE
9 Maine 8982 3.98 ME
10 Florida 7373 3.97 FL
11 New York 38440 3.93 NY
12 Delaware 33892 3.92 DE
13 Georgia 109 3.92 GA
14 South Carolina 1749 3.91 SC
15 Alaska 3651 3.87 AK
16 Pennsylvania 4709 3.84 PA
17 Connecticut 253 3.82 CT
18 Ohio 71603 3.81 OH
19 Washington 5782 3.80 WA
20 Colorado 28979 3.80 CO
21 West Virginia 41 3.79 WV
22 New Mexico 23 3.78 NM
23 North Carolina 2531 3.78 NC
24 Iowa 266 3.77 IA
25 Massachusetts 19437 3.77 MA
26 North Dakota 6 3.74 ND
27 Maryland 19607 3.73 MD
28 Montana 3179 3.72 MT
29 Indiana 1290 3.72 IN
30 New Jersey 2645 3.71 NJ
31 Idaho 1224 3.71 ID
32 District of Columbia 59 3.70 DC
33 Nevada 448 3.70 NV
34 Vermont 12234 3.69 VT
35 Arizona 797 3.68 AZ
36 Mississippi 630 3.67 MS
37 Hawaii 4048 3.65 HI
38 Wyoming 20 3.62 WY
39 New Hampshire 1514 3.60 NH
40 Wisconsin 12923 3.59 WI
41 Rhode Island 313 3.58 RI
42 Utah 1052 3.57 UT
43 Kentucky 588 3.57 KY
44 Texas 7528 3.55 TX
45 Arkansas 276 3.52 AR
46 South Dakota 92 3.50 SD
47 Tennessee 237 3.49 TN
48 Louisiana 4991 3.45 LA
49 Oklahoma 100 3.09 OK
50 Missouri 25067 3.07 MO
In [111]:
import plotly.graph_objects as go

fig = go.Figure(data = go.Choropleth(
    locations = top_states['state_code'],
    z = top_states['mean'],
    locationmode = 'USA-states',
    text = top_states['state']+'<br>'+'# reviews: '+ top_states['count'].astype(str),
    colorscale = 'Blues',
    colorbar_title = 'Average<br>Review<br>Score',
    colorbar = {'ticks':'outside'}
))

fig.update_layout(
    title_text = 'Average Beer Rating by State<br>(Hover for more info)',
    geo = dict(
        scope = 'usa',
        projection = go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)')
)

So interestingly enough this heatmap of the US does give us some additional insight.

For example, it looks like the coasts brew higher rated beer than the middle of the country. Except for Kansas and Nebraska which have the 1st and 9th highest average reviews respectively. Kansas tops the list with an average rating of 4.11, however it only has 132 reviews, which is realtively low.

In addition, it looks like the states surrounding the Great Lakes have high ratings as well. This is interesting, I wonder if it is because they have easy access to fresh water?

Next, I want to try and make a similar geospatial map with cities, and then figure how to overlay the two maps into one visualization.

In [112]:
# first we want to filter out cities with less than 10 reviews. 
city_filter = beer_df_clean.city.value_counts()
city_filter = city_filter[city_filter>=10].index
top_cities_df = beer_df_clean.query('city in @city_filter')
top_cities_df
Out[112]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563243 rows × 16 columns

In [113]:
# create dataframe of cities ranked by average rating. 
top_cities = top_cities_df.groupby('city').weighted_review.agg(['count', 'mean'])\
            .sort_values(by=['mean'],ascending=False).reset_index()
top_cities['mean'] = top_cities['mean'].round(2)
top_cities
Out[113]:
city count mean
0 Greensboro Bend 1518 4.28
1 Baker City 36 4.25
2 Brooklyn Ctr 6212 4.22
3 Minneapolis 8971 4.21
4 North Charleston 969 4.19
... ... ... ...
339 Saint Charles 117 2.68
340 Osseo 60 2.67
341 Inyokern 206 2.67
342 Saint Louis 15344 2.58
343 Big Bear Lake 18 2.32

344 rows × 3 columns

I am going to skip the pointplot for cities and go straight into the map.

First we need to append longitudes and latitudes to the top_cities dataframe as location references for the map.

In [114]:
top_cities = top_cities.merge(top_cities_df.groupby('city').first(), on='city')[['city', 'count', 'mean', 'longitude', 'latitude']]
top_cities
Out[114]:
city count mean longitude latitude
0 Greensboro Bend 1518 4.28 NaN NaN
1 Baker City 36 4.25 -117.829496 44.778780
2 Brooklyn Ctr 6212 4.22 -93.324389 45.042942
3 Minneapolis 8971 4.21 -93.266640 45.046388
4 North Charleston 969 4.19 NaN NaN
... ... ... ... ... ...
339 Saint Charles 117 2.68 NaN NaN
340 Osseo 60 2.67 -91.226651 44.582707
341 Inyokern 206 2.67 NaN NaN
342 Saint Louis 15344 2.58 -90.302316 38.738390
343 Big Bear Lake 18 2.32 -116.923042 34.238761

344 rows × 5 columns

In [115]:
# drop any cities without location data.
top_cities.dropna(inplace=True)
top_cities
Out[115]:
city count mean longitude latitude
1 Baker City 36 4.25 -117.829496 44.778780
2 Brooklyn Ctr 6212 4.22 -93.324389 45.042942
3 Minneapolis 8971 4.21 -93.266640 45.046388
6 Grand Rapids 20101 4.18 -85.676575 42.952496
7 Atlantic Highlands 14 4.15 -74.038165 40.411769
... ... ... ... ... ...
335 Monroe 1341 2.91 -89.641509 42.598631
337 La Crosse 409 2.73 -91.253350 43.804355
340 Osseo 60 2.67 -91.226651 44.582707
342 Saint Louis 15344 2.58 -90.302316 38.738390
343 Big Bear Lake 18 2.32 -116.923042 34.238761

256 rows × 5 columns

In [116]:
# round longitude and latitude values so that they aren't cumbersome when they are shown on the map. 
top_cities[['longitude', 'latitude']] = top_cities[['longitude', 'latitude']].round(2)
top_cities
Out[116]:
city count mean longitude latitude
1 Baker City 36 4.25 -117.83 44.78
2 Brooklyn Ctr 6212 4.22 -93.32 45.04
3 Minneapolis 8971 4.21 -93.27 45.05
6 Grand Rapids 20101 4.18 -85.68 42.95
7 Atlantic Highlands 14 4.15 -74.04 40.41
... ... ... ... ... ...
335 Monroe 1341 2.91 -89.64 42.60
337 La Crosse 409 2.73 -91.25 43.80
340 Osseo 60 2.67 -91.23 44.58
342 Saint Louis 15344 2.58 -90.30 38.74
343 Big Bear Lake 18 2.32 -116.92 34.24

256 rows × 5 columns

Before we plot, we're going to need a dummy row, since there are no cities with average scores less than 2. What this means is that the legend on the map won't show a marker for data points between 1-2. Rather it would only show markers for, 2-3, 3-4, and 4-5, which is incomplete.

Having a dummy row with NaN values will give us a data point we can use to populate a marker for review scores 1-2, even though there are no cities with an average score below 2.

In [117]:
top_cities.reset_index(drop=True, inplace=True)
top_cities = top_cities.append({'city':'Dummy row', 'count':np.nan, 'mean':np.nan, 'longitude':np.nan, 'latitude':np.nan}, ignore_index=True)
top_cities
Out[117]:
city count mean longitude latitude
0 Baker City 36.0 4.25 -117.83 44.78
1 Brooklyn Ctr 6212.0 4.22 -93.32 45.04
2 Minneapolis 8971.0 4.21 -93.27 45.05
3 Grand Rapids 20101.0 4.18 -85.68 42.95
4 Atlantic Highlands 14.0 4.15 -74.04 40.41
... ... ... ... ... ...
252 La Crosse 409.0 2.73 -91.25 43.80
253 Osseo 60.0 2.67 -91.23 44.58
254 Saint Louis 15344.0 2.58 -90.30 38.74
255 Big Bear Lake 18.0 2.32 -116.92 34.24
256 Dummy row NaN NaN NaN NaN

257 rows × 5 columns

Now we have a clean dataframe to use to create the map.

In [118]:
limits = [1,2,3,4]
colors = ['lightcyan', 'rgb(59,59,59)', 'cornflowerblue', 'rgb(241,105,19)']

fig = go.Figure()

# trace for dummy data, used to populate a 1-2 marker in the legend. 
df_1_2 = top_cities.query("city == 'Dummy row'")    
fig.add_trace(go.Scattergeo(
    name = '1 - 2',
    visible = 'legendonly',
    showlegend = True,
    lon = df_1_2['longitude'],
    lat = df_1_2['latitude'],
    marker = dict(
        color = 'lightgrey',
        line_color = 'rgb(40,40,40)',
        line_width = 0.5,
        size = 8)))

# city review data
for i in range(len(limits)):
    df_sub = top_cities[top_cities['mean'].between(i+1, i+2)]
    fig.add_trace(go.Scattergeo(
        locationmode = 'USA-states',
        lon = df_sub['longitude'],
        lat = df_sub['latitude'],
        text = df_sub['city']+'<br>'+'Review score: '+df_sub['mean'].astype(str)+'<br>'+'# reviews: '+df_sub['count'].astype(str),
        name = f'{i+1} - {i+2}',
        marker = dict(
            color = colors[i],
            opacity = 0.8,
            size = 8,
            line_color = 'rgb(40,40,40)',
            line_width = 0.5))) 

fig.update_layout(
        title = dict(
            text = 'Top Rated Cities for Drinking Beer<br>(Click on legend to toggle ratings)',
            xanchor = 'center',
            x = 0.5),
        showlegend = True,
        legend = dict(
            title = 'Review Score',
            x = 0.92,
            y = 0.52),
        geo = dict(
            scope = 'usa',
            landcolor = 'rgb(217,217,217)'))

Similar to the state map, many of the high rated cities are located on the coasts and around the great lakes.

No that we have two independent maps for states and cities, I want to try and merge both of them into one comprehensive map.

Individually, using color encodings works well, however if I want to combined them then I can't use color encodings for both. I think it would be best to keep the color encoding for the state portion of the map, and change the encoding for cities to color and shape.

In [119]:
colors = ['white', 'gold', 'crimson']
shapes = ['x', 'circle', 'star']
opacities = [1, 0.8, 1]
sizes = [9,8,10]

# state data
fig = go.Figure(go.Choropleth(
    locations = top_states['state_code'],
    z = top_states['mean'],
    locationmode = 'USA-states',
    text = top_states['state']+'<br>'+'# reviews: '+ top_states['count'].astype(str),
    name = 'state',
    colorscale = 'Blues',
    colorbar_title = 'State<br>Review<br>Score',
    colorbar = dict(
        ticks = 'outside',
        x = 0.95,
        y = 0.55)))

# trace for dummy data, used to populate a 1-2 marker in the legend. 
df_1_2 = top_cities.query("city == 'Dummy row'")    
fig.add_trace(go.Scattergeo(
    name = '1 - 2',
    visible = 'legendonly',
    showlegend = True,
    lon = df_1_2['longitude'],
    lat = df_1_2['latitude'],
    marker = dict(
        symbol = 'x-thin',
        color = 'black',
        line_color = 'rgb(40,40,40)',
        line_width = 0.5,
        size = 8)))

# city data
for i in range(len(colors)):
    df_sub = top_cities[top_cities['mean'].between(i+2, i+3)]
    fig.add_trace(go.Scattergeo(
        locationmode = 'USA-states',
        lon = df_sub['longitude'],
        lat = df_sub['latitude'],
        text = df_sub['city']+'<br>'+'Review score: '+df_sub['mean'].astype(str)+'<br>'+'# reviews: '+df_sub['count'].astype(str),
        name = f'{i+2} - {i+3}',
        marker = dict(
            symbol = shapes[i],
            color = colors[i],
            size = sizes[i],
            opacity= opacities[i],
            line_color = 'black',
            line_width = 0.75))) 

fig.update_layout(
        title = dict(
            text = 'Top Rated States & Cities for Drinking Beer<br>(Click on legend to toggle ratings)',
            xanchor = 'center',
            x = 0.5), 
        showlegend = True,
        legend = dict(
            title = dict(
                text = 'City Review Score',
                side = 'top'),
            xanchor = 'center',
            x = 0.5,
            y = -0.03,
            itemsizing = 'trace',
            orientation = 'h'),
        geo = dict(
            scope = 'usa',
            projection = go.layout.geo.Projection(type = 'albers usa'),
            showlakes = True,
            lakecolor = 'rgb(255, 255, 255)',
            landcolor = 'rgb(217,217,217)'))

Adding a shape encoding does seem to help a little bit when combining the two maps. Cities with the highest ratings, 4-5, stand out more with the star shape encoding.

By combining the two maps we can see a nice correlation between high rated states and the number of high rated cities in that state.


Now I want to move onto exploring a few qualitative variables with the quantitative variable beer_abv.

Let's take a look at beer_abv vs. beer_style.

First we will need to drop all beers that have multiple reviews, so that the beer_abv means aren't skewed by beers that have more reviews.

In [120]:
beer_unique = beer_df_clean.drop_duplicates(subset=['beer_name'])
beer_unique
Out[120]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
10 10789 Caldera Oatmeal Stout Oatmeal Stout 7.2 2.97 3.0 3.0 2.5 3.0 3.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
12 12386 Caldera OBF 15 American Pale Lager 5.6 3.76 4.0 3.0 4.0 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
13 58046 Rauch Ür Bock Rauchbier 7.4 4.36 4.5 4.5 3.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563949 44662 Whitewash Wheat Witbier 5.5 3.85 4.0 4.0 4.0 4.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563955 46681 PB Porter American Porter 6.5 3.50 3.5 3.5 3.5 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563959 58076 Amber Wave American Amber / Red Ale 5.4 3.29 4.0 3.5 2.5 3.0 3.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

9590 rows × 16 columns

In [121]:
top25_style_abv = beer_unique.groupby('beer_style').beer_abv.agg(['count', 'mean']).sort_values(by='mean', ascending=False)
top25_style_abv = top25_style_abv.query('count >= 10')
top25_style_abv = top25_style_abv.head(25)
top25_style_abv
Out[121]:
count mean
beer_style
Eisbock 11 11.118182
American Barleywine 197 10.582538
Quadrupel (Quad) 51 10.487255
English Barleywine 74 10.417568
Wheatwine 42 10.326190
American Double / Imperial Stout 305 10.236492
Russian Imperial Stout 165 9.816848
Belgian Strong Dark Ale 125 9.542480
American Strong Ale 206 9.320534
Tripel 129 9.207442
American Double / Imperial IPA 360 9.031500
Braggot 13 8.846154
Belgian Strong Pale Ale 109 8.771101
Old Ale 77 8.723377
American Malt Liquor 20 8.330000
Weizenbock 52 8.282308
Doppelbock 102 8.248725
English Strong Ale 34 8.208824
Baltic Porter 68 8.183676
Scotch Ale / Wee Heavy 126 8.153095
American Double / Imperial Pilsner 37 7.881081
Belgian IPA 77 7.818442
Dubbel 94 7.535851
American Black Ale 102 7.408725
Flanders Oud Bruin 19 7.405263
In [122]:
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

sns.pointplot(data=beer_unique, x='beer_abv', y='beer_style', order=top25_style_abv.index)

plt.grid(True)
plt.title('Beer Style by Average ABV', fontsize=14)
plt.ylabel('Beer Style')
plt.xlabel('Average ABV')
plt.yticks(weight=600, fontsize=12);

The above plot shows the top 25 beer styles based on average ABV. The highest average ABV beer style is Eisbock, which has a fairly large error bar due to only having 11 unique Eisbock style beers in the dataset.

I've never heard of a "wine" brew, like barleywine or wheatwine, but they hold 3 of the top 5 positions with relatively small error bars so there should be an explanation for this.

A quick google search tells us that it is in fact one of the most intense and strongest beer styles brewed. Cool.


beer_abv vs. brewery_type

In [123]:
top_brewtype_abv = beer_unique.groupby('brewery_type').beer_abv.agg(['count', 'mean'])\
                    .sort_values(by='mean', ascending=False).head(25)
top_brewtype_abv
Out[123]:
count mean
brewery_type
proprietor 7 7.914286
regional 2504 7.181066
contract 140 6.775000
micro 2591 6.759765
planning 43 6.690698
brewpub 3639 6.583770
large 666 6.550961

Alright so it looks like "proprietor" breweries have the highest average ABV. However, there are only 7 values making up this mean. So let's take a closer look at these values and see if it is worth keeping "proprietor" breweries.

In [124]:
beer_unique.query("brewery_type == 'proprietor'")
Out[124]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
75900 65944 Hop Riot IPA American IPA 7.3 4.25 4.5 4.5 4.5 4.0 4.0 High Water Brewing proprietor Stockton California -121.290780 37.957702
75903 65945 Retribution American Double / Imperial IPA 9.5 3.43 3.5 3.0 3.5 4.0 3.5 High Water Brewing proprietor Stockton California -121.290780 37.957702
75938 71267 Pom Cherry Bomb Fruit / Vegetable Beer 7.0 4.52 4.5 4.5 4.0 5.0 4.5 High Water Brewing proprietor Stockton California -121.290780 37.957702
75983 73636 No Boundary IPA American IPA 6.5 4.07 4.5 4.0 3.5 4.0 4.0 High Water Brewing proprietor Stockton California -121.290780 37.957702
75990 75585 Bourbon Barrel-Aged Baltic Porter Baltic Porter 9.3 2.53 2.5 2.5 3.0 2.5 2.5 High Water Brewing proprietor Stockton California -121.290780 37.957702
75991 75586 Brandy Barrel-Aged Baltic Porter Baltic Porter 9.3 2.03 2.0 2.0 2.5 2.0 2.0 High Water Brewing proprietor Stockton California -121.290780 37.957702
344527 73645 Revival Saison Saison / Farmhouse Ale 6.5 4.05 4.0 4.0 4.0 4.5 4.0 Revival Brewing Co. proprietor Cranston Rhode Island -71.471696 41.789838

So it doesn't seem like there are any beers in specific that are skewing the mean. However, since we have been holding to the minimum of 10 reviews/values to be considered significant during this analysis, let's stay consistent and drop "proprietor" breweries.

In [125]:
top_brewtype_abv = top_brewtype_abv.query("count >= 10")
top_brewtype_abv
Out[125]:
count mean
brewery_type
regional 2504 7.181066
contract 140 6.775000
micro 2591 6.759765
planning 43 6.690698
brewpub 3639 6.583770
large 666 6.550961
In [126]:
plt.figure(figsize=(10,12))
sns.set_theme(style='darkgrid')

sns.pointplot(data=beer_unique, x='beer_abv', y='brewery_type', order=top_brewtype_abv.index)

plt.grid(True)
plt.title('Brewery Types by Average ABV', fontsize=14)
plt.ylabel('Brewery Type')
plt.xlabel('Average ABV')
plt.yticks(weight=600);

It's interesting to see "large" breweries at the bottom of average ABV. It makes sense though, since large breweries are brewing for profit at scale, and making lower ABV beer is cheaper and faster than higher ABV beer.


Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

In this section we focused on a few quantitative variables, review_overall, weighted_review, and beer_abv. We compared these variables to a host of different qualitative variables, such as beer_name, brewery_type, state etc.

Some interesting findings with these variables:

  • review_taste has the highest correlation with review_overall, while beer_abv has the lowest correlation with review_overall.
  • Stout beers outperform in terms of average review score for both style and specific beers.
  • 'Planning' breweries, such as home breweries, have the highest average review scores. While 'large' breweries have the lowest average review scores.
  • Based on the geospatial maps, we found that states located on the coasts and around the great lakes tend to have higher average review scores. We found the same trend with cities.
  • When it came to beer_abv, we found that wine style beers, such as barleywine and wheatwine, have some of the highest average ABV values of any beer style, with 3 out of the top 5 ranking spots in Beer Style by Average ABV.
  • We also found that 'large' breweries have the lowest average ABV values. This makes sense, especially in tandem with having the lowest average review scores, since large breweries optimize for mass production. Part of that optimization causes large breweries to sacrifice quality, and produce beers with low ABV levels, which in general are faster and cheaper to brew.

Multivariate Exploration

For this section, I want to start by looking at the correlations between review_overall, which is the reviewer's overall impression of the beer, and the other main variables of interest, beer_abv, review_aroma, review_appearance, review_palate, review_taste, with respect to the categorical variable brewery_type.

We already found these correlations at the beginning of the 'Bivariate Exploration' section, without respect to brewery_type. The goal here is to see if certain brewery types have any significant affect on the correlations.

The practical question being answered here is: Does a certain brewery type have higher predictive value when it comes to review_overall scores?

In [127]:
# loops through each brewery type and corr. variable to find a corr. coeff. for each combination w/ respect to `review_overall`.
brewery_types = beer_df_clean.brewery_type.unique()
variables = ['beer_abv', 'review_aroma', 'review_appearance', 'review_palate', 'review_taste']
corr_list = []

for i in brewery_types:
    df = beer_df_clean[beer_df_clean['brewery_type'] == i]
    for var in variables:
        corr = df['review_overall'].corr(df[var])
        corr_dict = {'brewery_type':i, 'corr_variable':var, 'corr':corr}
        corr_list.append(corr_dict)
corr_list
Out[127]:
[{'brewery_type': 'micro',
  'corr_variable': 'beer_abv',
  'corr': 0.012526642381412429},
 {'brewery_type': 'micro',
  'corr_variable': 'review_aroma',
  'corr': 0.5922222631859658},
 {'brewery_type': 'micro',
  'corr_variable': 'review_appearance',
  'corr': 0.41664998498238043},
 {'brewery_type': 'micro',
  'corr_variable': 'review_palate',
  'corr': 0.6733650733979493},
 {'brewery_type': 'micro',
  'corr_variable': 'review_taste',
  'corr': 0.7910353269680666},
 {'brewery_type': 'brewpub',
  'corr_variable': 'beer_abv',
  'corr': 0.13365098743298603},
 {'brewery_type': 'brewpub',
  'corr_variable': 'review_aroma',
  'corr': 0.5808576911931546},
 {'brewery_type': 'brewpub',
  'corr_variable': 'review_appearance',
  'corr': 0.44989536764754695},
 {'brewery_type': 'brewpub',
  'corr_variable': 'review_palate',
  'corr': 0.6776375696599808},
 {'brewery_type': 'brewpub',
  'corr_variable': 'review_taste',
  'corr': 0.7804677926197844},
 {'brewery_type': 'regional',
  'corr_variable': 'beer_abv',
  'corr': 0.056070130306099766},
 {'brewery_type': 'regional',
  'corr_variable': 'review_aroma',
  'corr': 0.5467169355922762},
 {'brewery_type': 'regional',
  'corr_variable': 'review_appearance',
  'corr': 0.42048753151008644},
 {'brewery_type': 'regional',
  'corr_variable': 'review_palate',
  'corr': 0.6425836247307477},
 {'brewery_type': 'regional',
  'corr_variable': 'review_taste',
  'corr': 0.7506437115235541},
 {'brewery_type': 'large',
  'corr_variable': 'beer_abv',
  'corr': 0.28130902826242055},
 {'brewery_type': 'large',
  'corr_variable': 'review_aroma',
  'corr': 0.7171614064043597},
 {'brewery_type': 'large',
  'corr_variable': 'review_appearance',
  'corr': 0.6458759880977002},
 {'brewery_type': 'large',
  'corr_variable': 'review_palate',
  'corr': 0.7690822024322875},
 {'brewery_type': 'large',
  'corr_variable': 'review_taste',
  'corr': 0.8316946866430164},
 {'brewery_type': 'contract',
  'corr_variable': 'beer_abv',
  'corr': 0.16286765691936497},
 {'brewery_type': 'contract',
  'corr_variable': 'review_aroma',
  'corr': 0.6102266405467903},
 {'brewery_type': 'contract',
  'corr_variable': 'review_appearance',
  'corr': 0.45291352702610577},
 {'brewery_type': 'contract',
  'corr_variable': 'review_palate',
  'corr': 0.6857075280585636},
 {'brewery_type': 'contract',
  'corr_variable': 'review_taste',
  'corr': 0.8043782642265439},
 {'brewery_type': 'proprietor',
  'corr_variable': 'beer_abv',
  'corr': -0.008352243766859014},
 {'brewery_type': 'proprietor',
  'corr_variable': 'review_aroma',
  'corr': 0.7715927902829075},
 {'brewery_type': 'proprietor',
  'corr_variable': 'review_appearance',
  'corr': 0.5395636267474165},
 {'brewery_type': 'proprietor',
  'corr_variable': 'review_palate',
  'corr': 0.7464079993855959},
 {'brewery_type': 'proprietor',
  'corr_variable': 'review_taste',
  'corr': 0.8759607430057904},
 {'brewery_type': 'planning',
  'corr_variable': 'beer_abv',
  'corr': 0.0829995976542906},
 {'brewery_type': 'planning',
  'corr_variable': 'review_aroma',
  'corr': 0.5539129640717247},
 {'brewery_type': 'planning',
  'corr_variable': 'review_appearance',
  'corr': 0.4109868401707794},
 {'brewery_type': 'planning',
  'corr_variable': 'review_palate',
  'corr': 0.6132823221740077},
 {'brewery_type': 'planning',
  'corr_variable': 'review_taste',
  'corr': 0.7293727837516896}]

Above we created a list of dicts, where each dict contains a correlation coefficient, for each correlation variable, for each brewery type.

We are essentially running the same code we did when we found the corr. coeff. of these variables at the beginning of the 'Bivariate Exploration' section. Except here, we're finding the corr. coeff. for each brewery type, instead of the entire dataframe beer_df_clean.

Now from this list of dicts we can easily create a dataframe to plot with using seaborn.

In [128]:
corr_df = pd.DataFrame(corr_list)
corr_df
Out[128]:
brewery_type corr_variable corr
0 micro beer_abv 0.012527
1 micro review_aroma 0.592222
2 micro review_appearance 0.416650
3 micro review_palate 0.673365
4 micro review_taste 0.791035
5 brewpub beer_abv 0.133651
6 brewpub review_aroma 0.580858
7 brewpub review_appearance 0.449895
8 brewpub review_palate 0.677638
9 brewpub review_taste 0.780468
10 regional beer_abv 0.056070
11 regional review_aroma 0.546717
12 regional review_appearance 0.420488
13 regional review_palate 0.642584
14 regional review_taste 0.750644
15 large beer_abv 0.281309
16 large review_aroma 0.717161
17 large review_appearance 0.645876
18 large review_palate 0.769082
19 large review_taste 0.831695
20 contract beer_abv 0.162868
21 contract review_aroma 0.610227
22 contract review_appearance 0.452914
23 contract review_palate 0.685708
24 contract review_taste 0.804378
25 proprietor beer_abv -0.008352
26 proprietor review_aroma 0.771593
27 proprietor review_appearance 0.539564
28 proprietor review_palate 0.746408
29 proprietor review_taste 0.875961
30 planning beer_abv 0.083000
31 planning review_aroma 0.553913
32 planning review_appearance 0.410987
33 planning review_palate 0.613282
34 planning review_taste 0.729373
In [129]:
plt.figure(figsize=(15,10))
sns.set_style('darkgrid')
sns.pointplot(data=corr_df, x='corr_variable', y='corr', hue='brewery_type', 
              hue_order = top_brew_type.index, linestyles=('dotted'))
plt.title('Correlation with Overall Review by Brewery Type', weight='bold', fontsize=20)
plt.ylabel('Correlation Coefficient', labelpad=20, fontsize=15)
plt.xlabel('Correlation Variable', labelpad=20, fontsize=15)
plt.legend(title='Brewery Type');
In [130]:
# brewery type value counts for reference to the above plot. 
beer_df_clean.brewery_type.value_counts()
Out[130]:
regional      364947
large          69897
brewpub        57998
micro          56474
planning       10263
contract        3821
proprietor       117
Name: brewery_type, dtype: int64

So we actually find something interesting here. Most of the brewery types tend to have realtively similar corr. coeff., except for 'proprietor' and 'large' breweries.

For 'proprietor' breweries, the high variance in corr. coeff. could be due to the fact that there are only 117 data points. This would need some more investigation to show whether or not this is the case, which we won't get into for this anlaysis.

'Large' breweries don't have this potential issue, with almost 70,000 data points. If we ignore 'proprietor' breweries, 'large' breweries have the highest correlations across all variables. What this means is that beer_abv and the other review varibales have a higher predictive value for 'large' brewery beers, in terms of review_overall scores.


Next, I want to move away from correlations and focus on more practical insights and questions for everyday beer drinkers and enthusiasts.

Therefore, I want to start by looking into the following question: What high ABV beers have the highest ratings? In other words, what highly rated beers have the potential to intoxicate you quickly. With this information, your goal could be to get drunk fast or be informed and cautious when drinking high ABV beers.

I will be defining a high abv beer as any beer above the upper quartile of beer_abv.

After analyzing the upper quartile, I will answer the same question for values below the lower quartile, and for values in the IQR.

In [131]:
beer_unique.beer_abv.describe()
Out[131]:
count    9590.000000
mean        6.789241
std         2.170388
min         0.500000
25%         5.200000
50%         6.200000
75%         8.000000
max        41.000000
Name: beer_abv, dtype: float64

We first need to find what ABV values are above the upper quartile.

In [132]:
upperqrt_abv = beer_df_clean.query('beer_abv >= 8')
upperqrt_abv
Out[132]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
107 58920 Vas Deferens Ale Belgian Strong Dark Ale 8.1 3.83 4.0 4.0 4.5 4.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
228 58920 Vas Deferens Ale Belgian Strong Dark Ale 8.1 3.38 3.5 3.5 4.0 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
229 58920 Vas Deferens Ale Belgian Strong Dark Ale 8.1 2.98 3.0 4.0 2.0 3.0 2.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
230 58920 Vas Deferens Ale Belgian Strong Dark Ale 8.1 3.95 4.0 4.0 4.0 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
231 58920 Vas Deferens Ale Belgian Strong Dark Ale 8.1 3.45 3.5 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563923 62954 Unity (2010) California Common / Steam Beer 9.0 3.79 4.0 3.0 4.5 4.0 4.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563939 73587 Hopulent American Double / Imperial IPA 8.5 2.13 1.5 3.0 3.5 3.0 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563941 73587 Hopulent American Double / Imperial IPA 8.5 1.89 1.5 2.0 3.5 3.0 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563947 73588 St. Sideburn (Bourbon Barrel Aged) American Strong Ale 8.5 2.18 1.5 3.0 3.5 3.5 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563948 73588 St. Sideburn (Bourbon Barrel Aged) American Strong Ale 8.5 2.01 1.5 2.5 3.5 3.0 1.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

202927 rows × 16 columns

In [133]:
top25_upper_abv = upperqrt_abv.groupby('beer_name').weighted_review.agg(['count', 'mean'])
top25_upper_abv = top25_upper_abv.query('count >= 10').sort_values('mean', ascending=False).reset_index().head(25)
top25_upper_abv = top25_upper_abv.merge(beer_unique.query('beer_name in @top25_upper_abv.beer_name'), on='beer_name')\
                  [['beer_name', 'count', 'mean', 'beer_abv']]
top25_upper_abv
Out[133]:
beer_name count mean beer_abv
0 Rare Bourbon County Stout 249 4.649317 13.0
1 Green Flash Bourbon Barrel Aged Double Stout 14 4.641429 8.9
2 Founders CBS Imperial Stout 1274 4.616342 10.6
3 Three Sheets Barley Wine 12 4.600000 10.0
4 Notorius IPA3 22 4.592727 12.0
5 Sea Monster (3 Sheets Rum Oak Chip Aged) 12 4.585000 10.0
6 King Henry 98 4.573980 13.4
7 AleSmith Speedway Stout - Kopi Luwak 23 4.568261 12.0
8 Wooden Hell 76 4.553289 9.5
9 Imperial Eclipse Stout - Pappy Van Winkle 49 4.550408 9.5
10 Royal Oil 15 4.542667 12.5
11 Galaxy Imperial Single Hop IPA 76 4.542500 8.0
12 Founders KBS (Kentucky Breakfast Stout) 3864 4.537521 11.2
13 Darkness - Bourbon Barrel Aged 30 4.532667 10.3
14 Double Sunshine IPA 85 4.531647 8.0
15 Marshal Zhukov's Imperial Stout - Zhukov's Fin... 19 4.525263 11.5
16 The Abyss 1412 4.518746 11.0
17 Samuel Adams Millennium 22 4.517273 21.0
18 Birth Of Tragedy 29 4.515517 11.0
19 Russian Roulette 12 4.509167 9.3
20 Bourbon County Brand Coffee Stout 456 4.508026 13.0
21 Hunahpu's Imperial Stout - Laird's Apple Brand... 63 4.503333 11.5
22 Bell's Black Note Stout 130 4.500846 11.5
23 Hunahpu's Imperial Stout - Rum Barrel Aged 18 4.485556 11.5
24 Hunahpu's Imperial Stout 429 4.473916 11.5
In [134]:
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

ax = sns.pointplot(data=upperqrt_abv, x='weighted_review', y='beer_name', order=top25_upper_abv.beer_name)

plt.grid(True)
plt.title('Top Rated High ABV Beers' , fontsize=14)
plt.ylabel('Beer Name')
plt.xlabel('Average Rating')
plt.yticks(weight=600, fontsize=12);

We know that all of these beers are in the upper quartile, but there's a large range of values in the upper quartile. I think it would be useful to add each of these beer's ABV value on the plot for quick reference and comparison.

In [135]:
# create list of new tick labels with ABV values
upper_abv = top25_upper_abv.beer_abv.tolist()
labels = [item.get_text() for item in ax.get_yticklabels()]
labels_abv = []

for i, label in enumerate(labels):
    new_label = f'{label} [{upper_abv[i]} ABV]'
    labels_abv.append(new_label)
labels_abv
Out[135]:
['Rare Bourbon County Stout [13.0 ABV]',
 'Green Flash Bourbon Barrel Aged Double Stout [8.9 ABV]',
 'Founders CBS Imperial Stout [10.6 ABV]',
 'Three Sheets Barley Wine [10.0 ABV]',
 'Notorius IPA3 [12.0 ABV]',
 'Sea Monster (3 Sheets Rum Oak Chip Aged) [10.0 ABV]',
 'King Henry [13.4 ABV]',
 'AleSmith Speedway Stout - Kopi Luwak [12.0 ABV]',
 'Wooden Hell [9.5 ABV]',
 'Imperial Eclipse Stout - Pappy Van Winkle [9.5 ABV]',
 'Royal Oil [12.5 ABV]',
 'Galaxy Imperial Single Hop IPA [8.0 ABV]',
 'Founders KBS (Kentucky Breakfast Stout) [11.2 ABV]',
 'Darkness - Bourbon Barrel Aged [10.3 ABV]',
 'Double Sunshine IPA [8.0 ABV]',
 "Marshal Zhukov's Imperial Stout - Zhukov's Final Push [11.5 ABV]",
 'The Abyss [11.0 ABV]',
 'Samuel Adams Millennium [21.0 ABV]',
 'Birth Of Tragedy [11.0 ABV]',
 'Russian Roulette [9.3 ABV]',
 'Bourbon County Brand Coffee Stout [13.0 ABV]',
 "Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel [11.5 ABV]",
 "Bell's Black Note Stout [11.5 ABV]",
 "Hunahpu's Imperial Stout - Rum Barrel Aged [11.5 ABV]",
 "Hunahpu's Imperial Stout [11.5 ABV]"]
In [136]:
# replot
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

ax = sns.pointplot(data=upperqrt_abv, x='weighted_review', y='beer_name', order=top25_upper_abv.beer_name)

plt.grid(True)
plt.title('Top Rated High ABV Beers' , fontsize=14)
plt.ylabel('Beer Name')
plt.xlabel('Average Rating')
ax.set_yticklabels(labels_abv, fontdict={'fontsize':12, 'fontweight':600});

That looks better and a bit more informative in my opinion.

This plot shows us the top 25 rated beers in the upper quartile of ABV. This would be useful for anyone who is interested in drinking beers with above average ABV, while still taking into consideration other variables such as taste, feel, smell, and look of their beer.


Next, I want to do the same anlaysis with the lower quartile of beer_abv. This will attempt to answer the question: What low ABV beers have the highest ratings? In other words, what good beer can I drink for an extended amount of time, for example over the course of a day, without getting too intoxicated.

In [137]:
# What ABV value defines the lower quratile?
beer_unique.beer_abv.describe()
Out[137]:
count    9590.000000
mean        6.789241
std         2.170388
min         0.500000
25%         5.200000
50%         6.200000
75%         8.000000
max        41.000000
Name: beer_abv, dtype: float64
In [138]:
lowerqrt_abv = beer_df_clean.query('beer_abv <= 5.2')
lowerqrt_abv
Out[138]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
5 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 4.04 4.5 3.5 5.0 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563914 54635 Solidarity English Dark Mild Ale 3.8 4.07 4.5 4.0 3.5 4.0 4.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563915 54635 Solidarity English Dark Mild Ale 3.8 4.08 5.0 3.5 4.0 4.0 4.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563916 54635 Solidarity English Dark Mild Ale 3.8 3.85 4.0 4.0 4.0 4.5 3.5 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563917 54635 Solidarity English Dark Mild Ale 3.8 4.33 4.5 4.0 4.5 4.0 4.5 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731
563924 72212 Bourbon Oaked Solidarity English Dark Mild Ale 3.8 3.23 3.5 3.5 4.0 2.5 3.0 Eagle Rock Brewery micro Los Angeles California -118.244038 34.114731

103645 rows × 16 columns

In [139]:
top25_lower_abv = lowerqrt_abv.groupby('beer_name').weighted_review.agg(['count', 'mean'])
top25_lower_abv = top25_lower_abv.query('count >= 10').sort_values('mean', ascending=False).reset_index().head(25)
top25_lower_abv = top25_lower_abv.merge(beer_unique.query('beer_name in @top25_lower_abv.beer_name'), on='beer_name')\
                  [['beer_name', 'count', 'mean', 'beer_abv']]
top25_lower_abv
Out[139]:
beer_name count mean beer_abv
0 Carnie Fire 10 4.530000 5.0
1 Southampton Berliner Weisse 41 4.512927 2.0
2 Apollo Single Hop IPA 10 4.500000 5.2
3 Citra-delic Summer Wheat 11 4.476364 5.0
4 Fresh Hop 2006 13 4.426923 4.6
5 Tartare 11 4.377273 4.0
6 Great Lakes Wolfhound Stout 44 4.349545 4.8
7 Edward Pale Ale 113 4.344336 5.2
8 Black Snow Coffee Porter 14 4.337143 5.0
9 Solstice Pale Ale 15 4.306000 5.2
10 Blue Point Extra Special Bitter 35 4.274000 5.0
11 Amarillo Amber 13 4.269231 5.2
12 Simcoe 11 4.253636 5.2
13 Southampton Abbey Single 19 4.241579 4.5
14 Southampton Pale Ale 12 4.235000 5.2
15 Amarillo Single Hop Pale Ale 30 4.231000 5.0
16 32/50 91 4.221209 4.8
17 Unfiltered Double Barrel Ale 64 4.216875 5.0
18 English Malt Porter 12 4.206667 5.2
19 Briscoe 12 4.200000 5.0
20 Coffee Bender 1254 4.163062 5.1
21 Founders Frangelic Stout 40 4.158500 4.2
22 Wee Geech 10 4.155000 4.0
23 Brooklyn Sustainable Porter 23 4.151304 5.2
24 Ichabod Ale 2009 57 4.150526 5.0
In [140]:
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

ax1 = sns.pointplot(data=lowerqrt_abv, x='weighted_review', y='beer_name', order=top25_lower_abv.beer_name)

plt.grid(True)
plt.title('Top Rated Low ABV Beers' , fontsize=14)
plt.ylabel('Beer Name')
plt.xlabel('Average Rating')

lower_abv = top25_lower_abv.beer_abv.tolist()
labels_lower = [item.get_text() for item in ax1.get_yticklabels()]
labels_low_abv = [f'{label} [{lower_abv[i]} ABV]' for i, label in enumerate(labels_lower)]

ax1.set_yticklabels(labels_low_abv, fontdict={'fontsize':12, 'fontweight':600});

One interesting thing to note between both the lower and upper quartile ABV plots, is that it seems as if there is a "sweet spot" for high ratings in both quartiles. For the lower quartile, beers with ABV between 4 and 5.2 dominate the list, apart from "Southampton Berliner Weisse" which ranks second. For the upper quartile, the same is true for beers between 8 and 12 ABV.

Finally, let's take a look at beers in the IQR/between the lower and upper quartiles. Maybe someone just wants to know the highest rated 'medium' strength beers, in terms of ABV. So let's go ahead and do that.

In [141]:
# What ABV value defines the median quratile?
beer_unique.beer_abv.describe()
Out[141]:
count    9590.000000
mean        6.789241
std         2.170388
min         0.500000
25%         5.200000
50%         6.200000
75%         8.000000
max        41.000000
Name: beer_abv, dtype: float64
In [142]:
medianqrt_abv = beer_df_clean.query('5.2 < beer_abv < 8.0')
medianqrt_abv
Out[142]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
10 10789 Caldera Oatmeal Stout Oatmeal Stout 7.2 2.97 3.0 3.0 2.5 3.0 3.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
11 10789 Caldera Oatmeal Stout Oatmeal Stout 7.2 1.96 2.0 1.5 2.5 2.5 2.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
12 12386 Caldera OBF 15 American Pale Lager 5.6 3.76 4.0 3.0 4.0 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
13 58046 Rauch Ür Bock Rauchbier 7.4 4.36 4.5 4.5 3.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

256945 rows × 16 columns

In [143]:
medianqrt_abv.beer_abv.describe()
Out[143]:
count    256945.000000
mean          6.346707
std           0.722926
min           5.210000
25%           5.700000
50%           6.200000
75%           7.000000
max           7.900000
Name: beer_abv, dtype: float64
In [144]:
top25_median_abv = medianqrt_abv.groupby('beer_name').weighted_review.agg(['count', 'mean'])
top25_median_abv = top25_median_abv.query('count >= 10').sort_values('mean', ascending=False).reset_index().head(25)
top25_median_abv = top25_median_abv.merge(beer_unique.query('beer_name in @top25_median_abv.beer_name'), on='beer_name')\
                  [['beer_name', 'count', 'mean', 'beer_abv']]
top25_median_abv['beer_abv'] = top25_median_abv.beer_abv.round(1)
top25_median_abv
Out[144]:
beer_name count mean beer_abv
0 Hoppy Birthday 65 4.552154 5.2
1 Ashy Cynic 18 4.543333 6.7
2 Founders Syrup Del Noche 12 4.501667 7.5
3 Simcoe Pale Ale 17 4.462941 5.9
4 Civil Disobedience #1 16 4.455000 7.0
5 Black Marlin Porter With Cocoa Nibs 30 4.452000 6.0
6 SuperShow 69 18 4.450000 6.9
7 Columbus Pale Ale 16 4.439375 6.0
8 Everett Robust Porter 89 4.437528 7.5
9 Three Hour Tour 57 4.430877 6.1
10 Big Ticket Double IPA 10 4.425000 7.6
11 Sculpin India Pale Ale 8112 4.417885 7.0
12 Black Marlin Porter (Special Sour Version) 12 4.415000 5.5
13 Masala Mama India Pale Ale 662 4.402764 5.9
14 Hill Farmstead Nelson Sauvin IPA 11 4.400000 5.5
15 RPM IPA 40 4.394500 7.5
16 Triple Play IPA 16 4.393750 6.0
17 Imperial Coconut Porter 11 4.386364 7.9
18 King's Slipper 12 4.374167 6.2
19 Great Lakes Honey Ale 18 4.366667 7.5
20 Melrose IPA 13 4.366154 7.2
21 Event Horizon Cascadian Dark Ale 22 4.362727 7.0
22 Hill Farmstead Citra IPA 26 4.360769 5.5
23 Allagash Coolship Red 28 4.355000 5.7
24 O'Brien's IPA 53 4.352642 6.2

Hoppy Birthday isn't rounding properly. It has an ABV value of 5.25, but is rounding to 5.2 due to a floating point limitation. No other value has this problem, so let's just fix Hoppy Birthday.

In [145]:
# confirm ABV value of Hoppy Birthday. 
beer_df_clean.query("beer_name == 'Hoppy Birthday'").beer_abv.head(1)
Out[145]:
293961    5.25
Name: beer_abv, dtype: float64
In [146]:
top25_median_abv.at[0, 'beer_abv'] = 5.3
top25_median_abv
Out[146]:
beer_name count mean beer_abv
0 Hoppy Birthday 65 4.552154 5.3
1 Ashy Cynic 18 4.543333 6.7
2 Founders Syrup Del Noche 12 4.501667 7.5
3 Simcoe Pale Ale 17 4.462941 5.9
4 Civil Disobedience #1 16 4.455000 7.0
5 Black Marlin Porter With Cocoa Nibs 30 4.452000 6.0
6 SuperShow 69 18 4.450000 6.9
7 Columbus Pale Ale 16 4.439375 6.0
8 Everett Robust Porter 89 4.437528 7.5
9 Three Hour Tour 57 4.430877 6.1
10 Big Ticket Double IPA 10 4.425000 7.6
11 Sculpin India Pale Ale 8112 4.417885 7.0
12 Black Marlin Porter (Special Sour Version) 12 4.415000 5.5
13 Masala Mama India Pale Ale 662 4.402764 5.9
14 Hill Farmstead Nelson Sauvin IPA 11 4.400000 5.5
15 RPM IPA 40 4.394500 7.5
16 Triple Play IPA 16 4.393750 6.0
17 Imperial Coconut Porter 11 4.386364 7.9
18 King's Slipper 12 4.374167 6.2
19 Great Lakes Honey Ale 18 4.366667 7.5
20 Melrose IPA 13 4.366154 7.2
21 Event Horizon Cascadian Dark Ale 22 4.362727 7.0
22 Hill Farmstead Citra IPA 26 4.360769 5.5
23 Allagash Coolship Red 28 4.355000 5.7
24 O'Brien's IPA 53 4.352642 6.2
In [173]:
# plot
plt.figure(figsize=(12,15))
sns.set_theme(style='darkgrid')

ax2 = sns.pointplot(data=medianqrt_abv, x='weighted_review', y='beer_name', order=top25_median_abv.beer_name)

plt.grid(True)
plt.title('Top Rated Medium ABV Beers' , fontsize=14)
plt.ylabel('Beer Name')
plt.xlabel('Average Rating')

median_abv = top25_median_abv.beer_abv.tolist()
labels_median = [item.get_text() for item in ax2.get_yticklabels()]
labels_low_abv = [f'{label} [{median_abv[i]} ABV]' for i, label in enumerate(labels_median)]

ax2.set_yticklabels(labels_low_abv, fontdict={'fontsize':12, 'fontweight':600});

Sweet, looks good. This plot represents the top rated beers with average ABV values.


In [148]:
beer_df_clean
Out[148]:
beer_id beer_name beer_style beer_abv weighted_review review_overall review_aroma review_appearance review_palate review_taste brewery_name brewery_type city state longitude latitude
0 64883 Cauldron DIPA American Double / Imperial IPA 7.7 4.32 4.0 4.5 4.0 4.0 4.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
1 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.35 3.0 3.5 3.5 3.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
2 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.75 3.5 3.5 3.5 4.0 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
3 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.01 3.0 2.5 3.5 2.0 3.5 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
4 52159 Caldera Ginger Beer Herbed / Spiced Beer 4.7 3.68 4.0 3.0 3.5 3.5 4.0 Caldera Brewing Company micro Ashland Oregon -122.663374 42.183738
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
563961 58076 Amber Wave American Amber / Red Ale 5.4 3.46 3.5 3.0 4.0 4.0 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563962 58078 Shipwrecked Stout American Stout 6.5 3.31 3.0 3.0 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563963 58078 Shipwrecked Stout American Stout 6.5 3.75 3.5 3.5 3.5 4.0 4.0 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563964 58385 Belgian Blonde Anniversary Ale Belgian Pale Ale 6.5 3.63 4.0 3.5 4.0 3.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255
563965 69416 Belgian Dubbel Dubbel 6.0 3.27 3.0 3.5 3.0 2.5 3.5 Pacific Beach Ale House brewpub San Diego California -117.255265 32.794255

563517 rows × 16 columns

In [149]:
top25_upper_abv
Out[149]:
beer_name count mean beer_abv
0 Rare Bourbon County Stout 249 4.649317 13.0
1 Green Flash Bourbon Barrel Aged Double Stout 14 4.641429 8.9
2 Founders CBS Imperial Stout 1274 4.616342 10.6
3 Three Sheets Barley Wine 12 4.600000 10.0
4 Notorius IPA3 22 4.592727 12.0
5 Sea Monster (3 Sheets Rum Oak Chip Aged) 12 4.585000 10.0
6 King Henry 98 4.573980 13.4
7 AleSmith Speedway Stout - Kopi Luwak 23 4.568261 12.0
8 Wooden Hell 76 4.553289 9.5
9 Imperial Eclipse Stout - Pappy Van Winkle 49 4.550408 9.5
10 Royal Oil 15 4.542667 12.5
11 Galaxy Imperial Single Hop IPA 76 4.542500 8.0
12 Founders KBS (Kentucky Breakfast Stout) 3864 4.537521 11.2
13 Darkness - Bourbon Barrel Aged 30 4.532667 10.3
14 Double Sunshine IPA 85 4.531647 8.0
15 Marshal Zhukov's Imperial Stout - Zhukov's Fin... 19 4.525263 11.5
16 The Abyss 1412 4.518746 11.0
17 Samuel Adams Millennium 22 4.517273 21.0
18 Birth Of Tragedy 29 4.515517 11.0
19 Russian Roulette 12 4.509167 9.3
20 Bourbon County Brand Coffee Stout 456 4.508026 13.0
21 Hunahpu's Imperial Stout - Laird's Apple Brand... 63 4.503333 11.5
22 Bell's Black Note Stout 130 4.500846 11.5
23 Hunahpu's Imperial Stout - Rum Barrel Aged 18 4.485556 11.5
24 Hunahpu's Imperial Stout 429 4.473916 11.5

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The realtionship looked at during this section was the correlation between review_overall and the variables beer_abv, review_aroma, review_appearance, review_palate, and review_taste, with respect to brewery_type. We found that 'large' breweries have the highest correlations across all variables, aside from 'proprietor' breweries who only had 117 data points with the potential for a large margin of error.

Next, we took a look at a few multivariate rankings instead of correlations. These variables were beer_name, weighted_review, and beer_abv, where each beer was grouped into its respective quartile in terms of beer_abv. The aim here was to answer practical questions a beer drinker might have about the best beers to drink for different situations. For example someone might ask, what is the best/highest rated beer to drink over the course of the day? To answer this question, you may want to look for the highest rated beers that are in the lower quartile of beer_abv. In other words, you would want a beer with a high rating and a relatively low ABV content, so that you could drink it over the course of a day.

These were the top rated beers for each beer_abv quartile:

  • Lower Quartile: Carnie Fire at 5.0 ABV
  • Interquartile Range: Hoppy Birthday at 5.3 ABV
  • Upper Quartile: Rare Bourbon County Stout at 13.0 ABV

Were there any interesting or surprising interactions between features?

One intersting thing found in the rankings of lower quartile beers, was that the 2nd highest rated beer 'Southampton Berliner Weisse' has 2.0 ABV content, while having 4x the number of reviews as Carnie Fire (1st ranked).

When it came to the upper quartile of beer_abv, the top 3 ranked beers were all stouts. In addition, 16 of the top 25 beers in the upper quartile were either stouts or stout variations. From the 'Top Rated High ABV Beers' plot, there are 12 beers that have 'stout' in their name, while 4 do not, Russian Roulette, Sea Monster, Darkness - Bourbon Barrel Aged, and The Abyss. With well over half of the highest rated, high ABV beers being stouts, recommending a stout to someone looking for high ABV beers would be a safe bet.